-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support max source resolution for instant queries #1431
Support max source resolution for instant queries #1431
Conversation
dc7a3c3
to
765ac90
Compare
This looks right to me, but it'd be good if @bwplotka and or @GiedriusS have a look as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea to add that parameter there is good but not sure about the flag. Perhaps we could re-use --query.auto-downsampling
here and automatically select it depending on the range vectors?
a430f7e
to
4765b2a
Compare
Hi @GiedriusS, that might be a significantly more comprehensive and powerful solution. There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the
Configuring a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe we should change the default --query.instant.default.max_source_resolution=0s
to 1h
.
As it has a bit buggy behavior if you configure retention like in the example.
Thoughts?
That's what we set it to in our setup now and would probably leave it at. I just wasn't sure whether to include it in the Pull-Request. |
Indeed, it would be a much bigger change but not sure that introducing a new flag is the best solution. It's always easiest to do that but then after some time you end with a thing which has hundreds of knobs which is bad. I think my original suggestion would work out because actually we do downsampling at fixed intervals. The options that you've pasted only control the retention part. I have outlined this in my other issue what I have started about how Thanos Store should always prefer higher resolution data. I still would very much like @bwplotka to comment on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Thanks for this all!
I am fine with unblocking you @0robustus1 on this, if we make this flag hidden for now. Once that is done LGTM 👍 (: Essentially for long term direction we need bit more design/discussion. IMO there might be some work on PromQL to improve this flow (more below).
Overall, I agree with @GiedriusS and I would vote for query instant asking for range vector using exactly the same logic as the range query. I would really reuse --query.auto-downsampling
and same flow.
There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the 30d
Technically it would be nice to control all by the step
you provide to the query... which you cannot on the instant query. I think there were discussions on enabling step
for range vector selector [ .. ]
and that would be helpful. I would really love to follow on that with Prometheus team since at some point we want to propose downsampling to Prometheus itself.
Anyway, from Thanos perspective IMO querier should not be aware of anything on storage layer, especially retention - all it knows about is StoreAPI
The options that you've pasted only control the retention part. I have outlined this in my other issue what I have started about how Thanos Store should always prefer higher resolution data.
This is an interesting one. Especially for range vectors and functions, it might be really faster and equally accurate (: in most cases. (: But I don't think for all cases. I think still having granular control for each range vector selector would be the solution here, but let's discuss.
8e2c093
to
1fc56e5
Compare
I adjusted the code to the remaining review comments. The flag is now hidden. |
1fc56e5
to
5e69042
Compare
This adds support for the ?max_source_resolution query param for instant queries. Instant queries had the issue that when there was a subquery requesting wide data ranges e.g. sum_over_time(metric_name[30d])) and the retention of raw data was for example only 7d, the query would (silently) only take data from the last 7 days into account (ignoring the downsampled 5m and 1h timeseries that were available with longer retention). We default to 1h max_source_resolution for now as this will cover the highest/broadest resolution available and should include all retention. It is configurable per query (although no known clients send it). Signed-off-by: Tim Reddehase <[email protected]>
The flag only affects instant queries. Signed-off-by: Tim Reddehase <[email protected]>
Signed-off-by: Tim Reddehase <[email protected]>
Instead of defining an additional function we do the adjustment of maxSourceResolution directly in query_range. Signed-off-by: Tim Reddehase <[email protected]>
Signed-off-by: Tim Reddehase <[email protected]>
Signed-off-by: Tim Reddehase <[email protected]>
Signed-off-by: Tim Reddehase <[email protected]>
Signed-off-by: Tim Reddehase <[email protected]>
5e69042
to
2433524
Compare
Actually, I was super wrong.. All things I mentioned with resolution per range works right now: https://prometheus.io/blog/2019/01/28/subquery-support/#subqueries I might think we need to change slightly implementation then for querier and use per selection |
Failed quick experiment here so merging this "hidden" workaround for now. Thanks for doing this! Would love to have your input on further work in this area @0robustus1 |
thanos query --query.instant.default.max_source_resolution
flag to set a default value for maxSourceResolution on instant queries (as are for example used by thethanos rule
component)/api/v1/query
Reasoning
We have configured our compactor with different retentions for different resolution levels:
This causes a situation where the result of a
thanos rule
recording rule wasonly taking the last 7-8 days of data into account, because instant queries were
always using a maxSourceResolution of 0 and therefore only using raw resolution
data.
In our case the query looked like this and was executed on a thanos ruler
because our prometheuses only have a relatively short retention of a few days:
The
thanos rule
component runs an instant query in order to retrieve a singledatapoint to set for the recording rule metric. Our query requires that data
from at least 30 days is taken into account. But by default it would only select
raw data and therefore only provide data from the last 7 days.
Changes
This allows for
/api/v1/query
to accept the?max_source_resolution
query parameter(the documentation did not clearly state that it was only accepted for
/api/v1/query_range
prior to this).We also added a command line argument to set a default should the query parameter not be set.
We also propose the following (for later PRs):
max_source_resolution
for "Console"queries (which are executing instant queries).
max_source_resolution
on a per-recording-rule basis.Verification
max_source_resolution
parameter set and compared it with the last result we got from a range query.
rule was now producing the expected value.