-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler: MimirRulerTooManyFailedQueries alert due to user error #7668
Comments
cc: @krajorama We also got MimirRulerTooManyFailedQueries due to a bad rule uploaded by user. |
Reproduced with mimir-distributed 5.2.2 (Mimir 2.11). Update: this repro uses the built in querier in the ruler, not the remote ruler-querier functionality! I've started the chart with metamonitor enabled to get some metrics and created a recording rule for
I see |
I've upgraded to At the same time I see
I'm pretty sure this was actually fixed by me in #7567 . However this PR just missed the cut off for 2.12 release by a couple of days. |
Could not reproduced with remote ruler on latest weekly (r284-6db12671). At first I thought I did, but the ruler dashboard actually uses |
Tested in v2.12.0-rc.4. Could not reproduce, so I think the remote ruler version is fixed in 2.12 most likely by #7472 . Summary: should be fixed in remote ruler case in 2.12. And will be fixed for normal ruler case in 2.13. |
2.12 has been released. We should be good to close this, right? |
Describe the bug
We use mimir and the rules from the mimir-mixin. Recently we onboarded a customer who sends kubernetes metrics to our mimir cluster. Due to a configuration error on the customers kubernetes cluster, the kubelet metrics were scraped multiple times (multiple servicemonitors for kubelet). In the kube-prometheus stack there are the following rules:
This rules will fail with a
many-to-many matching not allowed
error if the kubelet is scraped by multiple jobs. This is obiously a user error and in the mimir logs we can observe the corresponding error messages:As soon as Mimir evaluates these rules, the
MimirRulerTooManyFailedQueries
alert is triggered. However, according to the runbook of this alert, these user errors should not trigger this alert:(https://grafana.com/docs/mimir/latest/manage/mimir-runbooks/#mimirrulertoomanyfailedqueries)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I would expect that user errors (such as rule with many-to-many matching will not increase the
cortex_ruler_queries_failed_total
counter.Environment
Additional Context
I saw this cortex issue which might be of relevance
The text was updated successfully, but these errors were encountered: