Advanced metrics with Prometheus and Grafana
10 min
this integration guide provides step by step instructions for setting up and configuring prometheus and grafana to monitor advanced metrics from the {{futurex}} {{vc}} cloud based cryptographic services it covers the overall architecture, prerequisites, and cryptotunnel configuration through the {{vc}} intelligence portal (vip) the topics also include installation and setup of prometheus and grafana, metric references, and visualization techniques to enable real time monitoring, alerting, and dashboarding for enhanced system reliability and compliance architecture the overall architecture of this integration involves the components shown in the following diagram steps for configuring each of these components is included in the sections that follow flowchart lr a\[customer\<br/>grafana] > b\[customer\<br/>prometheus] b > c\[cryptotunnel\<br/>guardian] c > d\[prometheus\<br/>proxy] d > e\[futurex\<br/>prometheus] classdef orange fill #ff9500,stroke #000,stroke width 2px,color #000 classdef orangered fill #ff4500,stroke #000,stroke width 2px,color #000 classdef crimson fill #dc143c,stroke #000,stroke width 2px,color #000 classdef gray fill #95a5a6,stroke #000,stroke width 2px,color #000 class a orange class b orangered class c crimson class d gray class e orangered {{vc}} cryptotunnels in the {{vc}} world, trust is a two way street the cryptotunnel uses three components to establish trust, starting with a private key local to your device when you generate the pki, which creates the private key, the system signs the key under a {{vc}} ca tree, the second component the {{vc}} ca tree that signed the key is the authority that establishes trust between the server and the client after the ca tree signs the private key, it becomes a signed certificate, the final component when you send the signed certificate through the cryptotunnel, the server knows the certificate is signed under the {{vc}} ca tree and thus is authentic that is how the server establishes trust in the application to establish trust in the opposite direction, from the application to the server, the server sends the server side signed certificate to the application the application client then validates the server identity, establishing the trusted relationship with mutual authentication after this handshake, you can encrypt all the data, satisfying pcs dss compliance requirements prometheus prometheus is an open source systems monitoring and alerting toolkit originally developed by soundcloud in 2012, it is now a graduated project of the cloud native computing foundation, which is part of the linux foundation and also hosts projects like kubernetes and fluentd the following list describes the main features of prometheus multi dimensional data model prometheus stores all data as time series, and each time series is uniquely identified by its metric name and a set of key value pairs, also known as labels promql (prometheus query language) prometheus provides a flexible query language to leverage its dimensional data model promql allows you to select and aggregate time series data in real time no reliance on distributed storage the prometheus main unit of reliability is the individual node, which is fully standalone and does not depend on network storage or other remote services collection happens through a pull model prometheus collects metrics from monitored targets by scraping http endpoints on these targets however, it also supports an intermediary gateway for scenarios where a pull model is unsuitable targets are discovered through service discovery or static configuration prometheus employs various service discovery mechanisms to discover scrape targets dynamically multiple modes of graphing and dashboarding support while prometheus provides a built in expression browser for exploring metrics, it also seamlessly integrates with graphical dashboard builder s such as grafana for advanced visualization alerting functionality prometheus has a highly flexible alerting system it enables you to define alerting rules for your metrics, and if those conditions are met, it sends alert notifications through its alertmanager component designed for reliability, prometheus can be the system you use during an outage to diagnose problems quickly many organizations use it to monitor their it infrastructure, from microservices, containers, and kubernetes at scale to iot devices it also supports a robust ecosystem of exporters for extending its monitoring capabilities grafana grafana is a popular open source tool for visualizing large scale measurement data it provides a powerful and elegant way to create, explore, and share dashboards and data with your team and the world grafana commonly helps visualize time series data for infrastructure and application analytics, but you also use it in other domains, including industrial sensors, home automation, weather, and process control it supports various data sources, including but not limited to prometheus, influxdb, elasticsearch, aws cloudwatch, mysql, and postgresql the following list describes some key features of grafana dashboard and visualizations grafana provides a feature rich data modeling interface for creating dashboards these dashboards can contain a variety of visualization widgets or panels (such as graphs, tables, single stats, gauges, maps, and so on) you can easily to switch the visualization type to compare different visual formats of the same data data source support grafana supports many databases and data sources, from time series databases to relational databases and cloud services you can create dashboards that pull data from multiple sources for a unified view alerting grafana provides robust alerting functionality you can define alert rules for your data and get notified via several channels when an alert is triggered annotations grafana allows you to annotate graphs with rich events when something noteworthy happens this function helps correlate the insights between different events and metrics dashboard sharing you can share a dashboard as a link, a snapshot, a pdf, or by embedding it in other web pages this makes it easy to collaborate with your team teams and authentication grafana supports user authentication, allowing you to control access to your dashboards it also has a multi tenant architecture, so you can set up and manage multiple independent organizations, each with its own users, dashboards, and data sources plugins grafana features a plug in architecture and offers various plugins that enable you to extend and customize the grafana capabilities grafana is a powerful tool for building visual dashboards to observe metrics in real time that various industries use widely virtucrypt monitoring metric reference this section provides a reference for {{vc}} metrics and mappings v2 monitoring the following table shows v2 monitoring metrics metric name type description labels vc ct max connections gauge int ct instance max allowed connections tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc ct connected clients gauge int ct instance current connected client count tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc ct run status gauge int ct instance run status ("status active" → 1 or "status inactive" → 0) tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str), status vc ct enabled status gauge int ct instance enabled ("enabled" → 1 or "disabled" → 0) tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc ct anonymous status gauge int ct instance allows anonymous tls ("allows anonymous" → 1 or "does not allow anonymous" → 0) tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc tls cert expiry gauge int (days) ct instance number of days until certificate expiry tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc tls version info gauge float ct instance tls version (e g , 1 2, 1 1) tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc probe success gauge int ct instance port probe ("connection success" → 1 or "connection failed" → 0) tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc probe duration seconds gauge float (s) ct instance number of seconds required for connection creation tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str) vc echo duration seconds gauge float (s) ct instance echo latency in seconds tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str), phase (str) vc connection error counter gauge int ct instance connection errors tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str), error type (str) vc tls handshake duration gauge float (s) ct instance tls handshake latency tunnel id (str), company name (str), host (str), port (int), api type (str), port header (str), guardian host (str), error type (str), discovery error code (str), discovery error description (str), outgoing host (str), outgoing port (str) metric usage the following table shows metric usage metrics format example metric{label 1=0, label 2=us east} metric name type description labels ct instance port status gauge int ct instance port status (open > 1 or closed > 0) company name (str), host (str), region (str), tunnel name (str) ct instance api type gauge int ct instance api type (refer to api type mappings table below) company name (str), host (str), region (str), tunnel name (str) ct instance service enabled gauge int ct instance service enabled (true > 1, false > 0) company name (str), host (str), region (str), tunnel name (str) ct instance service latency ms gauge int ct instance service latency in ms company name (str), host (str), region (str), tunnel name (str) ct instance accepting connections gauge int ct instance accepting connections (true > 1, false > 0) company name (str), host (str), region (str), tunnel name (str) ct instance certificate validity gauge int ct instance certificate validity (refer to certificate validity mappings table below) company name (str), host (str), region (str), tunnel name (str) ct instance clients connected total gauge int total clients connected to ct instance company name (str), host (str), region (str), `tunnel name (str) api type mappings the following table shows api type mappings value mapping 0 "none" 1 "international" 2 "excrypt" 3 "json" certificate validity mappings the following table shows certificate validity mappings value mapping 1 "max validity" 2 "under 90 days" 3 "under 60 days" 4 "under 30 days" 5 "under 7 days" 6 "expired"