-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce cardinality of metrics exposed by Mimir #1750
Comments
Good suggestions, I agree with all of them. |
Neat - this is great @pracucci . If I read your proposal correctly, you're not suggesting we replace any of these metrics with aggregated versions. Seems like you don't want to get rid of the Is that correct? I'm just trying to get a feel for cases where aggregations are or aren't helpful for people. |
When referring to dropping cortex_ring_member_ownership_percent - is that value on the UI page not derived from this metric? Do we recalculate it for the page? Unsure here. |
Correct. Reducing cardinality directly in Mimir (wherever possible) will benefit any Mimir user, not just who will use Grafana Labs' aggregations.
We have a function computing the %. The result is both used to populate the UI page and expose the metric. If we drop the metric, we'll just keep showing it in the UI. |
The suggestions in the issue's description has been all applied, so I'm going to close this issue. However, we may do other improvements over the time, as we find them. |
@pracucci -- just to be sure I'm following - this is 'series count', not 'metrics count' correct? |
Correct! |
The reduction on alertmanager was even more impressive (-68%), for the case there are many tenants. |
@colega did a great work to analyze Mimir metrics cardinality and which metrics are effectively used by any of our dashboards, alerts or manual queries issued at Grafana Labs. I took a quick look at the report of both used and unused metrics and I think we have room to reduce the cardinality of metrics exposed by Mimir.
Mimir metrics are typically low cardinality, unless you run very large Mimir clusters and/or with a very large number of tenants. Below I'm sharing some ideas on how we could reduce them.
cortex_ring_tokens_owned
Proposal: remove the metric (I think it's useless).
cortex_ring_member_ownership_percent
Proposal: the same information is available in the ring UI page, I think we can just remove the metric
cortex_alertmanager_notification_requests_total
andcortex_alertmanager_notification_requests_failed_total
Proposal: do not expose it for unused receivers. We could propose a change to not initialize them in Prometheus Alertmanager or do a trick to not expose these metrics if their value is 0 from Mimir (we remap them)
The same could be done for all other Alertmanager counters having both
user
andintegration
label, likecortex_alertmanager_notifications_total
.cortex_distributor_ingester_queries_total
andcortex_distributor_ingester_query_failures_total
Proposal: remove the metric (the same can be inferred by the generic grpc requests by route metric).
cortex_alertmanager_alerts_insert_limited_total
This and other Alertmanager counters with only the
user
label are exposed for all tenants regardless any value has ever been tracked (so even if the counter value is 0).If it was a normal
CounterVec
they wouldn't, but in this case they are because they've been remapped by Mimir.Proposal: improve remapping logic for counters, to optionally allow to not expose a counter if value is 0.
The text was updated successfully, but these errors were encountered: