Telemetry: add prometheus endpoint option #2937

jpds · 2017-06-29T11:11:33Z

This is a wishlist request to have an option within vault telemetry to configure an endpoint on vault so that prometheus servers can gather metrics from vault.

cosmopetrich · 2017-06-30T12:56:38Z

This has been discussed previously in #1230 and #1415.

jpds · 2017-06-30T16:07:21Z

Well, exposing a port with some text is a security concern, then use the push-gateway:

jefferai · 2017-07-01T20:14:37Z

The right course of action there would be to enhance go-metrics to support push-gateway.

siepkes · 2017-10-04T15:25:36Z

The push gateway will probably always be akward:

The Prometheus Pushgateway allows you to push time series from these components to an intermediary job which Prometheus can scrape.

Personally I regard that as an extra moving part which can breakdown. Prometheus actually has some valid points regarding push vs pull: https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push?

@jefferai In this #1415 (comment) you state:

An authenticated /1/sys/metrics that allows access to go-metrics data wouldn't be bad. The issue with Prometheus is that it requires running network-handling code that we have no control over, and from a security perspective that's not something we wanted to bake into Vault.

Would you be open to a pull request which adds an authenticated /1/sys/metrics endpoint which uses Vault own network-handling code but fetches the metrics internally from go-metrics?

jcmcken · 2018-03-19T15:44:23Z

I like the idea of a plain, token-authenticated, HTTP/S endpoint that provides JSON-formatted metrics, agnostic to Prometheus or any other particular solution (similar to Consul)

andybrown668 · 2018-03-30T17:16:10Z

I'm going to be using vault in a production environment (five nodes per site in HA mode backed by etcd) and will need to trigger alerts if any of the nodes needs to be unsealed.
I already use Prometheus and AlertManager so I'd like to plumb Vault into that infrastructure.
Given the lack of support for Prometheus, what's the 'blessed' alternative to do this?

jaloren · 2018-03-30T19:45:04Z

@andybrown668 its not ideal but you can use a statsd exporter.

https://github.com/prometheus/statsd_exporter

So you have vault push its metrics to the exporter and then have prometheus scrape the metrics from the exporter. Its pretty ugly and makes metric collection significantly more complicated but it does work. It requires sidecaring the exporter on the same host as the vault instance, otherwise host label won't be set properly.

I found that use consul service discovery made this less annoying.

Word of caution: I would not use dogstatsd exporter. If vault cannot connect to the exporter, then vault crashes which means that an exporter becomes a SPOF for vault. I opened a bug against vault and it was closed because from hashicorp's point of view this is working as expected. This problem does not occur with statsd since metrics are exported over UDP.

ayashjorden · 2018-03-31T07:52:45Z

If you're using influxdata/telegraf, it has a statsD input plugin (act as a statsD server), this way you get system metrics and Vault metrics in one component (vs. Prometheus NodeExporter+statsDExproter)

leyraroro · 2018-05-04T14:05:54Z

You can use blackbox for that. So for example in the blackbox.yml you can have
vault_unseal: prober: http timeout: 5s http: valid_status_codes: [200,429] method: GET no_follow_redirects: true fail_if_ssl: false fail_if_not_ssl: false fail_if_matches_regexp: - 'sealed":true'

The valid status codes are 200 and 429, because the standby node replies with a 429 (which is expected) and the active node with a 200

The rule in alertmanager to trigger the alerts:
- alert: Vault_node_sealed expr: probe_success{job="vault_sealed"} != 1 for: 1m labels: severity: xxx annotations:xxx

You can also use statsd-exporter to gather more specific stats and better alerts with expressions like:
expr: sum(increase(vault_core_leadership_lost_count{job="example"}[1h])) > 5

Hope it helps.

tamalsaha · 2018-08-17T12:35:33Z

Folks, I see that go-metrics library has some support for Prometheus https://github.com/armon/go-metrics/tree/master/prometheus . Can this be used to expose Prometheus metrics as @jefferai mentioned?

jurgenweber · 2019-03-28T23:12:36Z

as per here; https://coreos.com/tectonic/docs/latest/vault-operator/user/monitoring.html#alerting-rules These metrics do not seem to exist in Vault 1.1.0. Does anyone have any recommendation for alerts outside of these?

michelvocks · 2020-02-03T10:55:08Z

Closing this since, apparently, this has been implemented with #5308.

uepoch mentioned this issue Aug 29, 2018

Support same metrics endpoints as nomad and consul #5223

Closed

tamalsaha mentioned this issue Sep 3, 2018

Monitoring Vault server kubevault/project#4

Closed

catsby added feature-request core Issues and Pull-Requests specific to Vault Core labels Nov 5, 2019

michelvocks closed this as completed Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telemetry: add prometheus endpoint option #2937

Telemetry: add prometheus endpoint option #2937

jpds commented Jun 29, 2017

cosmopetrich commented Jun 30, 2017

jpds commented Jun 30, 2017

jefferai commented Jul 1, 2017

siepkes commented Oct 4, 2017

jcmcken commented Mar 19, 2018

andybrown668 commented Mar 30, 2018

jaloren commented Mar 30, 2018 •

edited

Loading

ayashjorden commented Mar 31, 2018

leyraroro commented May 4, 2018 •

edited

Loading

tamalsaha commented Aug 17, 2018

jurgenweber commented Mar 28, 2019

michelvocks commented Feb 3, 2020

Telemetry: add prometheus endpoint option #2937

Telemetry: add prometheus endpoint option #2937

Comments

jpds commented Jun 29, 2017

cosmopetrich commented Jun 30, 2017

jpds commented Jun 30, 2017

jefferai commented Jul 1, 2017

siepkes commented Oct 4, 2017

jcmcken commented Mar 19, 2018

andybrown668 commented Mar 30, 2018

jaloren commented Mar 30, 2018 • edited Loading

ayashjorden commented Mar 31, 2018

leyraroro commented May 4, 2018 • edited Loading

tamalsaha commented Aug 17, 2018

jurgenweber commented Mar 28, 2019

michelvocks commented Feb 3, 2020

jaloren commented Mar 30, 2018 •

edited

Loading

leyraroro commented May 4, 2018 •

edited

Loading