Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry: add prometheus endpoint option #2937

Closed
jpds opened this issue Jun 29, 2017 · 12 comments
Closed

Telemetry: add prometheus endpoint option #2937

jpds opened this issue Jun 29, 2017 · 12 comments
Labels
core Issues and Pull-Requests specific to Vault Core feature-request

Comments

@jpds
Copy link
Contributor

jpds commented Jun 29, 2017

This is a wishlist request to have an option within vault telemetry to configure an endpoint on vault so that prometheus servers can gather metrics from vault.

@cosmopetrich
Copy link

This has been discussed previously in #1230 and #1415.

@jpds
Copy link
Contributor Author

jpds commented Jun 30, 2017

Well, exposing a port with some text is a security concern, then use the push-gateway:

@jefferai
Copy link
Member

jefferai commented Jul 1, 2017

The right course of action there would be to enhance go-metrics to support push-gateway.

@siepkes
Copy link
Contributor

siepkes commented Oct 4, 2017

The push gateway will probably always be akward:

The Prometheus Pushgateway allows you to push time series from these components to an intermediary job which Prometheus can scrape.

Personally I regard that as an extra moving part which can breakdown. Prometheus actually has some valid points regarding push vs pull: https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push?

@jefferai In this #1415 (comment) you state:

An authenticated /1/sys/metrics that allows access to go-metrics data wouldn't be bad. The issue with Prometheus is that it requires running network-handling code that we have no control over, and from a security perspective that's not something we wanted to bake into Vault.

Would you be open to a pull request which adds an authenticated /1/sys/metrics endpoint which uses Vault own network-handling code but fetches the metrics internally from go-metrics?

@jcmcken
Copy link

jcmcken commented Mar 19, 2018

I like the idea of a plain, token-authenticated, HTTP/S endpoint that provides JSON-formatted metrics, agnostic to Prometheus or any other particular solution (similar to Consul)

@andybrown668
Copy link

I'm going to be using vault in a production environment (five nodes per site in HA mode backed by etcd) and will need to trigger alerts if any of the nodes needs to be unsealed.
I already use Prometheus and AlertManager so I'd like to plumb Vault into that infrastructure.
Given the lack of support for Prometheus, what's the 'blessed' alternative to do this?

@jaloren
Copy link
Contributor

jaloren commented Mar 30, 2018

@andybrown668 its not ideal but you can use a statsd exporter.

https://github.com/prometheus/statsd_exporter

So you have vault push its metrics to the exporter and then have prometheus scrape the metrics from the exporter. Its pretty ugly and makes metric collection significantly more complicated but it does work. It requires sidecaring the exporter on the same host as the vault instance, otherwise host label won't be set properly.

I found that use consul service discovery made this less annoying.

Word of caution: I would not use dogstatsd exporter. If vault cannot connect to the exporter, then vault crashes which means that an exporter becomes a SPOF for vault. I opened a bug against vault and it was closed because from hashicorp's point of view this is working as expected. This problem does not occur with statsd since metrics are exported over UDP.

@ayashjorden
Copy link

If you're using influxdata/telegraf, it has a statsD input plugin (act as a statsD server), this way you get system metrics and Vault metrics in one component (vs. Prometheus NodeExporter+statsDExproter)

@leyraroro
Copy link

leyraroro commented May 4, 2018

You can use blackbox for that. So for example in the blackbox.yml you can have
vault_unseal: prober: http timeout: 5s http: valid_status_codes: [200,429] method: GET no_follow_redirects: true fail_if_ssl: false fail_if_not_ssl: false fail_if_matches_regexp: - 'sealed":true'

The valid status codes are 200 and 429, because the standby node replies with a 429 (which is expected) and the active node with a 200

The rule in alertmanager to trigger the alerts:
- alert: Vault_node_sealed expr: probe_success{job="vault_sealed"} != 1 for: 1m labels: severity: xxx annotations:xxx

You can also use statsd-exporter to gather more specific stats and better alerts with expressions like:
expr: sum(increase(vault_core_leadership_lost_count{job="example"}[1h])) > 5

Hope it helps.

@tamalsaha
Copy link

Folks, I see that go-metrics library has some support for Prometheus https://github.com/armon/go-metrics/tree/master/prometheus . Can this be used to expose Prometheus metrics as @jefferai mentioned?

@jurgenweber
Copy link

as per here; https://coreos.com/tectonic/docs/latest/vault-operator/user/monitoring.html#alerting-rules These metrics do not seem to exist in Vault 1.1.0. Does anyone have any recommendation for alerts outside of these?

@catsby catsby added feature-request core Issues and Pull-Requests specific to Vault Core labels Nov 5, 2019
@michelvocks
Copy link
Contributor

Closing this since, apparently, this has been implemented with #5308.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues and Pull-Requests specific to Vault Core feature-request
Projects
None yet
Development

No branches or pull requests