-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit repetitive error logging in ingesters #5894
Comments
related: #1900 |
Around the time when this issue was created, Mimir also gained a log sampling facility (#5584). A bunch of errors are now being downsampled: Lines 543 to 553 in c0038b1
(Although at the moment it only seems to be honored in ingester gRPC middleware - so, errors returned from gRPC calls, but not errors sent directly to a logger (#7690).) Related and around the same time was overall rate limiting support for emitted logs (#5764). That one comes with metrics ( I could use some help finding types of errors that we still think are causing floods. I'll ping you internally to see if we can locate more.
Supplementing sampled errors with an accurate counter metric would no doubt be valuable. I will give that some thought as I learn more here. |
I have surveyed the scene. With the implementation of #5584 and #5764, I have found no logs that are still egregiously abusing our systems. Both of those changes are protecting the system from major flash logging floods. There are a couple of things that we could do as part of this issue:
I think one viable option is to close this ticket and focus on what I suspect is the ultimate solution: #1900. (Because both of those options noted above could be flexibly handled by #1900.) |
Mimir ingesters can occasionally become bogged down logging high volumes of repeated errors, such as "out of bounds" errors. This could happen following a temporary outage of some ingesters. Ideally, these repeated errors should be sampled, and perhaps replaced with a metric.
Let's update Mimir to use sampled logging for errors that might put unnecessary pressure on ingesters.
The text was updated successfully, but these errors were encountered: