-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add write limits for tenants in Thanos Receiver #5404
Comments
FYI: pressed the button early accidentally. I am still writing. |
Ok, I think I am mostly ready with the writing at this moment. |
Also considering to add a limit on the maximum amount of labels per timeseries. |
Well, not a good idea to record the amount of labels per timeseries: cardinality will be very high. Keeping it out of the plans for now. |
I think we can use bloom filter to track per tenant cardinality. |
@hanjm good idea! Thanks for the suggestion, I will investigate how we could use it. |
Have you seen how Loki implements this as well? Might give some inspiration (not sure if relevant tho). For example limiting on bytes/s might also be useful. Anyhow, sounds really great to have! |
From community discussion:
|
@wiardvanrij thanks for the tip. I checked out their limits and I believe it makes total sense that we have a limit on request body size too. I don't want to dive into rate limits (i.e. bytes per second or timeseries per second) in this proposal though, as these add more complications to keep calculating the data. I believe they could be part of a different proposal and possibly take advantage of the outcome of #5415. This proposal is more for limiting the size of remote write requests, which requires no state and is much easier to implement. |
The remote write request body size (in bytes) will be exported as a histogram. And I propose these buckets:
Please let me know if you have other suggestions. For the amount of samples & timeseries per request, I would like to also make them a histogram. The bucket would be configurable and a default provided. Which buckets do you think we could provide as default? Possibly generate an exponential bucket set using |
FYI, the remote write request body size metric was added as a summary without any quantiles defined yet. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
I'm on vacations, bot. Don't stale me! |
Everything planned in this issue has already been implemented. Closing it. |
Is your proposal related to a problem?
Tenants can overload Thanos Receivers with remote write requests and bring the whole system down.
Describe the solution you'd like
I would like put maximum limits on the size of remote write requests, so that a single tenant is less likely to negatively affect others.
These are the proposed "knobs" for limiting (please feel free to propose more and give your opinion) the remote write endpoint usage:
Hitting the two first limits should trigger an HTTP response with status code 413 (entity too large) and the last one should triggers a 429 (too many requests). In the future, the 413 error can be identified by the remote write clients and the data split into smaller requests under the limits.
I would also like to expose the current values and limits of each "knob" as metrics of Thanos Receive. This would allow easy tracking of the limit system.
To ensure backwards compatibility, limiting should be optional and disabled by default.
For the sake of simplicity and iteration, starting with a global value for each knob (all tenants have the same upper bound limit) and later adding the possibility of configuring different values per tenant.
Describe alternatives you've considered
Additional context
TODO
The text was updated successfully, but these errors were encountered: