-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky TestDistributor/caching_unmarshal_data_enabled/series_with_exemplars
#7448
Comments
there's another instance of the same test in a different PR a few minutes apart https://github.com/grafana/mimir/actions/runs/8017427753/job/21902224686?pr=7446
The test was last changed in August 2023 thought. It could be a genuine failure since it started showing up so often |
I was just going to add this. I've seen this failing all morning today in 3 separate PRs. |
I'm going to look into this. |
I reviewed again #7393 to see if I introduced a bug in |
I'm trying to understand it via draft PR: #7454 |
I found the cause. It's not a new thing. It's a timing issue in the ingester. The exemplar is not not ingested (but silently) because max exemplars is set to 0 in TSDB, due to a timing issue. I'm working on a fix. |
I think the issue got worse when we merged this PR yesterday #7440 because it computes the limits using the ingesters ring instead of lifecycler, but the ingester ring may not be updated so it could see "zero ingesters in the ring" when computing the desired "max local exemplars". If it see no ingesters in the ring, then it computes a value of 0, which for the case of exemplars means "no exemplars allowed". |
Discussed different options on how to fix it with @pstibrany. Going to fix it in #7424 since it's already modifying how we compute the local exemplar limits. |
I pushed a commit to #7424 to fix this issue, which was introduced with #7440. In details, the problem:
|
Fixed by #7424. |
The test failed on a PR which doesn't work with exemplars or the distributor
https://github.com/grafana/mimir/actions/runs/8016409523/job/21899113333?pr=7173
The text was updated successfully, but these errors were encountered: