Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: adds downsample duration histogram #4552

Merged
merged 7 commits into from
Aug 23, 2021

Conversation

vanugrah
Copy link
Contributor

@vanugrah vanugrah commented Aug 11, 2021

Signed-off-by: Anugrah Vijay [email protected]

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Registers and implements downsampleDuration histogram.
  • New var for downsample duration to avoid recomputing time.Since across logs/metric.

Verification

  • Local validation of compactor with local minIO bucket

@vanugrah
Copy link
Contributor Author

vanugrah commented Aug 11, 2021

Based on @yeya24 's feedback on the previous PR I will be:

  • Adding the downsample resolution as label to the histogram.
  • Increasing the starting bucket size.

@vanugrah
Copy link
Contributor Author

With respect to the histogram buckets, we could get rid of the initial few buckets and make it 64, 128, 256, 512, 1024, 2048, 4096, 8192 or we could define more user friendly intervals such as 1m, 5m, 15m, 30m, 1h, 2h, 4h. WDYT?

cmd/thanos/downsample.go Outdated Show resolved Hide resolved
@yeya24
Copy link
Contributor

yeya24 commented Aug 11, 2021

Anything you still want to update? @vanugrah If not please mark this pr ready for review. 😄

@vanugrah
Copy link
Contributor Author

Bucket ranges? Currently it mimics the compactors geometric series: 64, 128, 256, 512, 1024, 2048, 4096, 8192.
But changing that to recognizable intervals may be more user friendly: 1m, 5m, 15m, 30m, 60m, 120m, 240m.

@yeya24
Copy link
Contributor

yeya24 commented Aug 11, 2021

Bucket ranges? Currently it mimics the compactors geometric series: 64, 128, 256, 512, 1024, 2048, 4096, 8192.
But changing that to recognizable intervals may be more user friendly: 1m, 5m, 15m, 30m, 60m, 120m, 240m.

Latter sounds better.

@vanugrah vanugrah marked this pull request as ready for review August 11, 2021 23:58
yeya24
yeya24 previously approved these changes Aug 12, 2021
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution.

@vanugrah
Copy link
Contributor Author

😄 Thanks for the speedy feedback!

@vanugrah
Copy link
Contributor Author

So for flakey e2e tests do we just rerun?

@yeya24
Copy link
Contributor

yeya24 commented Aug 12, 2021

So for flakey e2e tests do we just rerun?

I restarted it and it is passing now

@vanugrah
Copy link
Contributor Author

Wooot 🎉 Thanks for your help @yeya24 . More PR's soon to come.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With --wait doesn't this mean that we will have infinitely different groups? 🤔 Perhaps a histogram over all durations would be more suitable i.e. drop the group label?

@vanugrah
Copy link
Contributor Author

vanugrah commented Aug 12, 2021

Hello! 👋

Based on the DefaultGroupKey it looks like the two parts of the label are the resolution and a hash of the meta labels. So if I'm not mistaken the cardinality would increase by num_buckets x 3 (1 for each resolution) for each unique set of meta labels, i.e. each unique source of blocks. So theoretically yes there is no upper bound on groups given that it correlates with things uploading blocks. But in practice, I haven't heard of more than a few thousand block sources for a single bucket, even on the larger side. Though, this is definitely just a heuristic.

Another point to consider is that the other downsample metrics already use the group key as a label: https://github.com/thanos-io/thanos/blob/main/cmd/thanos/downsample.go#L46-L53

And we already initialize the metric for all groups:
https://github.com/thanos-io/thanos/blob/main/cmd/thanos/downsample.go#L117-L119

So we're not necessarily introducing anything new to risk unbounded cardinality and we'd be able to align labels with the existing metrics.

Still wrapping my head around the compactor so let me know if I've misunderstood anything 😄

@vanugrah
Copy link
Contributor Author

@GiedriusS Any thoughts?

@vanugrah
Copy link
Contributor Author

@yeya24 Your review was marked stale because I had to resolve a merge conflict on the Changelog. Care to re-review?

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge it.

@yeya24 yeya24 enabled auto-merge (squash) August 23, 2021 21:15
@yeya24 yeya24 disabled auto-merge August 23, 2021 21:15
@yeya24 yeya24 merged commit ce1c4fe into thanos-io:main Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants