-
Notifications
You must be signed in to change notification settings - Fork 1.3k
cadvisor metrics and problematic identification #17365
Comments
Possible solutions⭐ Dhall saves usProvide the metric relabel config in generated configuration (example above and in sourcegraph/deploy-sourcegraph#1644) based on Monitoring generate generates exceptionsThe monitoring generator could generate regex excludes on a case-by-case basis. This is really gnarly, and probably not a great idea, since we could jsut as easily accidentally break things on another deployment method (which is what happened in https://github.com/sourcegraph/sourcegraph/issues/17072) Namespace checking in
|
Heads up @davejrt @ggilmore @dan-mckean @caugustus-sourcegraph @StephanX - the "team/delivery" label was applied to this issue. |
Background
Prometheus generally attaches useful labels based on the target it is scraping. For example, when scraping
frontend
, Prometheus reaches out tofrontend
, knows certain things aboutfrontend
(e.g. service name, pod, instance, etc), and can attach those labels onto metrics exported byfrontend
cAdvisor exports metrics for other containers. So despite all cAdvisor metrics looking like they are coming from cAdvisor, they are actually for other containers.
On some systems cAdvisor generates a
name
that is some combination of fields that might make a target monitored by cAdvisor unique. This worked alright for a while right up until we discovered it didn't: https://github.com/sourcegraph/sourcegraph/issues/17069, https://github.com/sourcegraph/sourcegraph/issues/17072Problem
We need an effective way to identify Sourcegraph services inside cAdvisor metrics. The current strategy is outlined in our docs, but the approach is not perfect:
prometheus-to-*
exporters get picked up on theprometheus
matcher.We are a bit hamstrung in that whatever name-labelling convention we have must also:
io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace,io.kubernetes.pod.uid
)sourcegraph-frontend
vsfrontend
vssourcegraph-frontend-internal
)Docker-compose doesn't seem to be as much of an issue since it doesn't seem they are generally deployed on machines that do anything other than serve Sourcegraph on docker-compose, but in kubernetes there's no telling what's on the nodes.
One approach attempted was to filter on namespace via
metric_relabel_configs
in k8s (sourcegraph/deploy-sourcegraph#1644), e.g.:but due to the various ways a namespace can be applied when deploying, there's no guarantee that a customer won't forget to update the Prometheus relabel rule - more discussion in sourcegraph/deploy-sourcegraph#1578. Regardless, this is the change currently applied in Cloud and k8s.sgdev.org to resolve our issue.
The text was updated successfully, but these errors were encountered: