Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Add per-each-write limit to the allowed number of samples #3963

Closed
wants to merge 9 commits into from

Conversation

spaparaju
Copy link
Contributor

@spaparaju spaparaju commented Mar 24, 2021

Fix for #3964

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  1. Added configurable 'receive.samples-limit-per-each-write' at Thanos Receive component to limit the number of allowed samples per each write API call
  2. Update the pkg/receive/handler to enforce this limit
  3. Added unit tests at pkg/receive/handler_test.go

Verification

Verified with unit tests at pkg/receive/handler_test.go

@spaparaju spaparaju changed the title add configurable limit to number of samples for each write API call at Thanos receive component add per-each-write limit to number of samples that can be accepted Mar 24, 2021
@spaparaju spaparaju changed the title add per-each-write limit to number of samples that can be accepted add per-each-write limit to the allowed number of samples Mar 24, 2021
@kakkoyun kakkoyun requested review from squat, bwplotka and brancz March 29, 2021 12:59
@kakkoyun kakkoyun changed the title add per-each-write limit to the allowed number of samples receive: Add per-each-write limit to the allowed number of samples Mar 29, 2021
CHANGELOG.md Outdated
@@ -15,6 +15,7 @@ We use _breaking :warning:_ to mark changes that are not backward compatible (re
### Added
- [#3903](https://github.com/thanos-io/thanos/pull/3903) Store: Returning custom grpc code when reaching series/chunk limits.
- [3919](https://github.com/thanos-io/thanos/pull/3919) Allow to disable automatically setting CORS headers using `--web.disable-cors` flag in each component that exposes an API.
- [3963](https://github.com/thanos-io/thanos/pull/3963) add per-each-write limit to the allowed number of samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [3963](https://github.com/thanos-io/thanos/pull/3963) add per-each-write limit to the allowed number of samples.
- [3963](https://github.com/thanos-io/thanos/pull/3963) Receive: Add per-each-write limit to the allowed number of samples. (Default: 5000)

Signed-off-by: spaparaju <[email protected]>
@spaparaju spaparaju force-pushed the limit-series-count branch from 15219a3 to 3ab5cc6 Compare March 29, 2021 14:37
@brancz
Copy link
Member

brancz commented Apr 15, 2021

The default seems rather low. Remote write will soon be based on commits by Prometheus, which means that entire scrapes will be sent within one remote write request. Especially in Kubernetes land it's quite common to have 100k series per scrape (eg. apiserver and kube-state-metrics). I wonder, is the amount of samples of a single request really the concern? Wouldn't limiting the number of active series be the better protection mechanism? Or samples per second over a window?

Regardless of the strategy I would also expect this to be a per-tenant configuration. (although I guess starting with a default would be ok)

Signed-off-by: spaparaj <[email protected]>
@squat
Copy link
Member

squat commented Jun 23, 2021

@spaparaju any thoughts on @brancz's idea? To make this per-tenant, should we have the limit be specified in the hashring configurations?

@spaparaju
Copy link
Contributor Author

spaparaju commented Jun 29, 2021

The default seems rather low. Remote write will soon be based on commits by Prometheus, which means that entire scrapes will be sent within one remote write request. Especially in Kubernetes land it's quite common to have 100k series per scrape (eg. apiserver and kube-state-metrics). I wonder, is the amount of samples of a single request really the concern?
Yes, default seems low to me (after thinking. a bit more on the comment). Maybe 50K seems reasonable default.
This idea is to get a counterpart at the service / storage side to match the current Prometheus remoteWrite 'max_samples_per_send' and protection mechanism at an individual remoteWrite call (any one remoteWrite call can not bring a service down). This I guess helps to publish the contract on the service side to backoff the samples beyond limit per each send.
Wouldn't limiting the number of active series be the better protection mechanism? Or samples per second over a window?
Yes, tracking active time series, samples per time window are quite useful protection measures across multiple remoteWrite calls.

Regardless of the strategy I would also expect this to be a per-tenant configuration. (although I guess starting with a default would be ok)

Signed-off-by: SriKrishna Paparaju <[email protected]>
Signed-off-by: SriKrishna Paparaju <[email protected]>
@stale
Copy link

stale bot commented Oct 30, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Oct 30, 2021
@stale stale bot closed this Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants