You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's possible for StatsD metrics in the vets-api codebase to "clash", such as when the label names on the metric are inconsistent. When this happens, the statsd-exporter process that runs on the vets-api instances, crashes and restarts, which prevents new stats from being scraped by prometheus.
In summary, we lose reporting of many of the vets-api metrics.
time="2020-09-03T19:03:30Z" level=fatal msg="A change of configuration created inconsistent metrics for \"api_auth_saml_request\". You have to restart the statsd_exporter, and you should consider the effects on your monitoring setup. Error: a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"api_auth_saml_request\", help: \"Metric autogenerated by statsd_exporter.\", constLabels: {type=\"mhv\",version=\"v0\"}, variableLabels: []} has different label names or a different help string" source="exporter.go:82"
Another possibility worth research, can we configure statsd-exporter to not restart when an invalid set of tags are received?
I believe we're still running version 0.3.0 of statsd_exporter, its possible that if we upgrade to a more recent version, post 0.10.2, inconsistent labels won't cause the exporter to restart 🤷♂️
Good idea. Even in the newest version, it still seems like having inconsistent labels is a bad thing, it just won't cause a restart. Probably best to upgrade and then create a metric filter on the new error message.
Nearly a year has passed, and no progress was made to such a metric filter. The fact that the new version of statsd-exporter no longer drops metrics w/ inconsistent labels makes it so that we really don't need to know about this anymore. I'm gonna close. Anyone who disagrees, feel free to re-open.
The Problem
It's possible for StatsD metrics in the
vets-api
codebase to "clash", such as when the label names on the metric are inconsistent. When this happens, thestatsd-exporter
process that runs on thevets-api
instances, crashes and restarts, which prevents new stats from being scraped by prometheus.In summary, we lose reporting of many of the
vets-api
metrics.More Context
This occurred on Sept 3, 2020 - link
Metrics were introduced that clashed with existing metrics. This resulted in fatal error messages in the statsd-exporter log file:
(Cloudwatch logs source)
Work to be Done
The text was updated successfully, but these errors were encountered: