-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query: add initial time-based query pushdown #4712
Conversation
bc82df0
to
6f546a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this work! Can you add another E2E test case for query and store gateway with and without query pushdown? Is the returned result the same for range query?
We are filtering stores using the given time range in the API. But when promql engine performs the Select call, will it use the same time range given from the API? https://github.com/prometheus/prometheus/blob/f103acd5135b8bbe885b17a73dafc7bbb586319c/promql/engine.go#L762 If we can extract this logic from promql, then maybe it is possible to pushdown instant queries as well. |
0a10904
to
323d331
Compare
Add time-based pushdown mechanism. It works by directly querying a Thanos Store node if it is the only one partially or fully matching the time range of a query. I believe that this will be the case most of the times because typically Sidecar/Ruler/Receive cover a few days of time range and horizontally balanced Thanos Store instances cover the historical data from remote object storage. With query-frontend in front splitting the queries into smaller parts means that with higher time ranges, most of the queries can be simply pushed down. For example, with Sidecars covering 2 days of data, with a 7d query, and a split interval of 1d this means that 5/7 * 100 = ~71% of processing can be pushed down to the leaf nodes. Small ad-hoc tests which touch a lot of timeseries show a 2x reduction in query duration. Hide this under an "experimental feature" flag because this still doesn't have everything implemented intentionally: * `lazySeriesSet` for concurrent `Select()`s in Thanos Store * Some missing Prometheus engine flags such as the look-back delta * Support the selection of StoreAPI nodes in the UI (use only them, not all endpoints) * No chunks/series limiters Also, this API could be potentially implemented in other components. If the approach looks good then we can merge this and implement these in follow-up PRs. The main parts of this change is in: * New QueryAPI service * New `pkg/pushdown/` * API request handling code Signed-off-by: Giedrius Statkevičius <[email protected]>
323d331
to
5cb14f9
Compare
That's true but I'd like to start simple and cover only one case at the beginning as in this PR. Maybe you could add this idea to #305? |
@@ -146,6 +147,7 @@ func (es *grpcEndpointSpec) fillExpectedAPIs(componentType component.Component, | |||
MinTime: mintime, | |||
MaxTime: maxTime, | |||
}, | |||
Query: &infopb.QueryInfo{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that without --endpoint
it means that we assume that all Stores implement QueryAPI :s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think the main question we have to ask ourselves - do we need another API.
The thing is that StoreAPI is well equipped for the work - the pipeline is ready for doing global eval, local eval or mixed. Let's go through different executions for query count(metric1{target=ext1})
. This makes PromQL engine selects in Querier metric1{target=ext1}
(and with count
function in hints). Then Querier proxy code (StoreAPI querier implementation) fanouts to all matching stores. And let's say for simple case. There is only ONE matching store. And no duplicated metrics. We can then:
-
Ask this matching store for non-pushdown StoreAPI call. This will return all selected
metric1
series from remote store. We pass that as it is to PromQL engine in StoreSet. Then PromQL counts them up and returns result. -
But in the same time, we can do a special StoreAPI.Series call with e.g pushdown=true field that will tell the underlying store to give us count of all metric1 series. So it will return us single series
forcount(metric1{target=ext1})
as long as we know there are no duplicates across fanouts, we can put that as a result to PromQL engine. PromQL engine will take that as only series count it (but it's only one, so fine) and return to response.
We can enhance 2 and even pass step so only necessary samples are returned that would be use by PromQL.
I know it might be complex, so maybe a second method to service makes sense, but I would really really like to avoid another API - too much complexity IMO 🤔
option (gogoproto.goproto_unrecognized_all) = false; | ||
option (gogoproto.goproto_sizecache_all) = false; | ||
|
||
service Query { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think the main question we have to ask ourselves - do we need another API, let me explain on main comment.
If we would just expose QuerAPI through gRPC - this might work, but only for super limited cases, where is only one Store matched. We block (or need separate path) for cases where we have |
PromQL engine only gets series set as result and it will try to apply |
What else do you want than Both are just series with some samples, nothing else. |
I understand you and remember this idea from your design document but are these two things exclusive to each other? I really think that we can have both. Just like most of the databases do optimizations at many layers, I think we should do both (and much more). The optimization in this PR is dead simple and quite trivial - the only complex part IMO is implementing the PromQL engine. To be honest, there is probably no clear-cut answer. But this pull request would solve already many problems in at least two big deployments of Thanos that I know of. Implementing pushdown has been the longest time on our backlog but it has never been properly implemented - perhaps that is because the idea in its entirety is quite complex and we must divide it into smaller parts as in this PR? Also, I understand that we should strive to implement as generic solutions as possible to ultimately reduce the complexity so that it would be easier to understand the code and maintain it but also on the other hand IMHO sometimes optimization come in smaller pieces, and two extra function calls in @yeya24 what's your opinion on this topic? |
They can coexist and I think it is okay to have a separate API for pushdown for now as it is under an experimental feature flag. In the short term, it is hard to hack the promql engine to achieve pushdown directly. What we need to do is to ensure the results consistency between pushdown and normal query. Continuous benchmarking or tests would help so that we can compare the performance. |
Right, thanks for this. I don't want to block innovation here, especially since it's not me who is spending time on this right now. I wished we could sit down and brainstorm this together in the following weeks. I can agree reusing the same API might be complex work to achieve. With a separate API, we might have better velocity. I am still a bit opposed to adding an entire new API that has all those limitations at the start. We already have a complex codebase, experiments like this will not help with maintaining the existing work we are doing. I wonder if it's worth maintaining the feature branch for this work together? |
Right, I understand the final results are just some samples. But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated. |
Could we split this PR then into smaller ones? I could start with a pull request that adds the query engine to Thanos Store with tests. Then, we could add this time-based pushdown logic in future PRs. Finally, we could work on adding that hooking logic into the PromQL engine. Maybe that would be easier? |
That is totally fine, no? We need to do that in order to aggregate pre-aggregated results (: |
Anyway, I think we are still not sure if we need a different API or not. That is what blocks us from developing on main IMO. I would keep it on the feature branch still. We can try different approaches and compare. I would move this PR as it is into one Thanos pushdown branch (not your forks's PR), then I can try reusing Series and we can experiment. You see I would love to avoid extra API because of complexities like #4780 We would need to implement all twice and it's not trivial code to fan out all. The SIngle pipeline would be nice, if possible. |
It's not fine to do that with functions. For example, if you do |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Add time-based pushdown mechanism. It works by directly querying a
Thanos Store node if it is the only one partially or fully matching the
time range of a query. I believe that this will be the case most of the
times because typically Sidecar/Ruler/Receive cover a few days of time
range and horizontally balanced Thanos Store instances cover the
historical data from remote object storage. With query-frontend in front
splitting the queries into smaller parts means that with higher time ranges, most of
the queries can be simply pushed down. For example, with Sidecars
covering 2 days of data, with a 7d query, and a split interval of 1d
this means that 5/7 * 100 = ~71% of processing can be pushed down to the
leaf nodes.
Small ad-hoc tests which touch a lot of timeseries show a 2x reduction
in query duration.
Hide this under an "experimental feature" flag
because this still doesn't have everything implemented intentionally to have a smaller change:
lazySeriesSet
for concurrentSelect()
s in Thanos Storeall endpoints)
Also, this API could be potentially implemented in other components.
If the approach looks good then we can merge this and implement these in follow-up PRs.
The main parts of this change is in:
pkg/pushdown/