query: add initial time-based query pushdown #4712

GiedriusS · 2021-09-29T11:25:02Z

Add time-based pushdown mechanism. It works by directly querying a
Thanos Store node if it is the only one partially or fully matching the
time range of a query. I believe that this will be the case most of the
times because typically Sidecar/Ruler/Receive cover a few days of time
range and horizontally balanced Thanos Store instances cover the
historical data from remote object storage. With query-frontend in front
splitting the queries into smaller parts means that with higher time ranges, most of
the queries can be simply pushed down. For example, with Sidecars
covering 2 days of data, with a 7d query, and a split interval of 1d
this means that 5/7 * 100 = ~71% of processing can be pushed down to the
leaf nodes.

Small ad-hoc tests which touch a lot of timeseries show a 2x reduction
in query duration.

Hide this under an "experimental feature" flag
because this still doesn't have everything implemented intentionally to have a smaller change:

lazySeriesSet for concurrent Select()s in Thanos Store
Missing some of the Prometheus engine flags
Support the selection of StoreAPI nodes in the UI (use only them, not
all endpoints)
No chunks/series limiters

Also, this API could be potentially implemented in other components.

If the approach looks good then we can merge this and implement these in follow-up PRs.

The main parts of this change is in:

New QueryAPI service
New pkg/pushdown/
API request handling code

yeya24

Love this work! Can you add another E2E test case for query and store gateway with and without query pushdown? Is the returned result the same for range query?

yeya24 · 2021-09-30T06:43:24Z

We are filtering stores using the given time range in the API. But when promql engine performs the Select call, will it use the same time range given from the API?

https://github.com/prometheus/prometheus/blob/f103acd5135b8bbe885b17a73dafc7bbb586319c/promql/engine.go#L762
From this function, IIUC the time range will be adjusted based on the promql query itself. WDYT?

If we can extract this logic from promql, then maybe it is possible to pushdown instant queries as well.

Add time-based pushdown mechanism. It works by directly querying a Thanos Store node if it is the only one partially or fully matching the time range of a query. I believe that this will be the case most of the times because typically Sidecar/Ruler/Receive cover a few days of time range and horizontally balanced Thanos Store instances cover the historical data from remote object storage. With query-frontend in front splitting the queries into smaller parts means that with higher time ranges, most of the queries can be simply pushed down. For example, with Sidecars covering 2 days of data, with a 7d query, and a split interval of 1d this means that 5/7 * 100 = ~71% of processing can be pushed down to the leaf nodes. Small ad-hoc tests which touch a lot of timeseries show a 2x reduction in query duration. Hide this under an "experimental feature" flag because this still doesn't have everything implemented intentionally: * `lazySeriesSet` for concurrent `Select()`s in Thanos Store * Some missing Prometheus engine flags such as the look-back delta * Support the selection of StoreAPI nodes in the UI (use only them, not all endpoints) * No chunks/series limiters Also, this API could be potentially implemented in other components. If the approach looks good then we can merge this and implement these in follow-up PRs. The main parts of this change is in: * New QueryAPI service * New `pkg/pushdown/` * API request handling code Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS · 2021-09-30T09:03:06Z

We are filtering stores using the given time range in the API. But when promql engine performs the Select call, will it use the same time range given from the API?

https://github.com/prometheus/prometheus/blob/f103acd5135b8bbe885b17a73dafc7bbb586319c/promql/engine.go#L762 From this function, IIUC the time range will be adjusted based on the promql query itself. WDYT?

If we can extract this logic from promql, then maybe it is possible to pushdown instant queries as well.

That's true but I'd like to start simple and cover only one case at the beginning as in this PR. Maybe you could add this idea to #305?

GiedriusS · 2021-09-30T09:04:57Z

pkg/query/endpointset.go

@@ -146,6 +147,7 @@ func (es *grpcEndpointSpec) fillExpectedAPIs(componentType component.Component,
 				MinTime: mintime,
 				MaxTime: maxTime,
 			},
+			Query: &infopb.QueryInfo{},


Note that without --endpoint it means that we assume that all Stores implement QueryAPI :s

bwplotka

OK, I think the main question we have to ask ourselves - do we need another API.

The thing is that StoreAPI is well equipped for the work - the pipeline is ready for doing global eval, local eval or mixed. Let's go through different executions for query count(metric1{target=ext1}). This makes PromQL engine selects in Querier metric1{target=ext1} (and with count function in hints). Then Querier proxy code (StoreAPI querier implementation) fanouts to all matching stores. And let's say for simple case. There is only ONE matching store. And no duplicated metrics. We can then:

Ask this matching store for non-pushdown StoreAPI call. This will return all selected metric1 series from remote store. We pass that as it is to PromQL engine in StoreSet. Then PromQL counts them up and returns result.
But in the same time, we can do a special StoreAPI.Series call with e.g pushdown=true field that will tell the underlying store to give us count of all metric1 series. So it will return us single series
for count(metric1{target=ext1}) as long as we know there are no duplicates across fanouts, we can put that as a result to PromQL engine. PromQL engine will take that as only series count it (but it's only one, so fine) and return to response.

We can enhance 2 and even pass step so only necessary samples are returned that would be use by PromQL.

I know it might be complex, so maybe a second method to service makes sense, but I would really really like to avoid another API - too much complexity IMO 🤔

bwplotka · 2021-09-30T11:36:12Z

pkg/pushdown/querypb/rpc.proto

+option (gogoproto.goproto_unrecognized_all) = false;
+option (gogoproto.goproto_sizecache_all) = false;
+
+service Query {


OK, I think the main question we have to ask ourselves - do we need another API, let me explain on main comment.

bwplotka · 2021-09-30T11:47:01Z

If we would just expose QuerAPI through gRPC - this might work, but only for super limited cases, where is only one Store matched. We block (or need separate path) for cases where we have count, sum etc function and multiple stores.. I think this is bit of missed opportunity - and we sacrifice a huge complexity if we would have now: QueryAPI, StoreAPI and StorePushdownAPI... etc

yeya24 · 2021-09-30T12:36:48Z

But in the same time, we can do a special StoreAPI.Series call with e.g pushdown=true field that will tell the underlying store to give us count of all metric1 series. So it will return us single series
for count(metric1{target=ext1}) as long as we know there are no duplicates across fanouts, we can put that as a result to PromQL engine. PromQL engine will take that as only series count it (but it's only one, so fine) and return to response.

PromQL engine only gets series set as result and it will try to apply count again. How to bypass this and tell it the result was already calculated? I feel if we don't have another API other than Select we cannot achieve query pushdown.

bwplotka · 2021-09-30T13:21:27Z

What else do you want than series result as input to PromQL? (: Global PromQL can take raw series but also pre-calculated series from another PromQL engine, in same way, no? (:

Both are just series with some samples, nothing else.

GiedriusS · 2021-10-01T06:18:08Z

OK, I think the main question we have to ask ourselves - do we need another API.

...
I know it might be complex, so maybe a second method to service makes sense, but I would really really like to avoid another API - too much complexity IMO thinking

I understand you and remember this idea from your design document but are these two things exclusive to each other? I really think that we can have both. Just like most of the databases do optimizations at many layers, I think we should do both (and much more). The optimization in this PR is dead simple and quite trivial - the only complex part IMO is implementing the PromQL engine.

To be honest, there is probably no clear-cut answer. But this pull request would solve already many problems in at least two big deployments of Thanos that I know of. Implementing pushdown has been the longest time on our backlog but it has never been properly implemented - perhaps that is because the idea in its entirety is quite complex and we must divide it into smaller parts as in this PR?

Also, I understand that we should strive to implement as generic solutions as possible to ultimately reduce the complexity so that it would be easier to understand the code and maintain it but also on the other hand IMHO sometimes optimization come in smaller pieces, and two extra function calls in queryRange() doesn't seem like a huge overhead at all.

@yeya24 what's your opinion on this topic?

yeya24 · 2021-10-01T07:26:03Z

Also, I understand that we should strive to implement as generic solutions as possible to ultimately reduce the complexity so that it would be easier to understand the code and maintain it but also on the other hand IMHO sometimes optimization come in smaller pieces, and two extra function calls in queryRange() doesn't seem like a huge overhead at all.
@yeya24 what's your opinion on this topic?

They can coexist and I think it is okay to have a separate API for pushdown for now as it is under an experimental feature flag. In the short term, it is hard to hack the promql engine to achieve pushdown directly.

What we need to do is to ensure the results consistency between pushdown and normal query. Continuous benchmarking or tests would help so that we can compare the performance.

bwplotka · 2021-10-07T11:18:07Z

Right, thanks for this. I don't want to block innovation here, especially since it's not me who is spending time on this right now. I wished we could sit down and brainstorm this together in the following weeks. I can agree reusing the same API might be complex work to achieve. With a separate API, we might have better velocity.

I am still a bit opposed to adding an entire new API that has all those limitations at the start. We already have a complex codebase, experiments like this will not help with maintaining the existing work we are doing. I wonder if it's worth maintaining the feature branch for this work together?

yeya24 · 2021-11-14T19:46:13Z

Both are just series with some samples, nothing else.

Right, I understand the final results are just some samples. But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated.
I think that's why we need a new API to bypass the series post-processing from the engine. If it is possible to have a hook at the engine side then that would be great.

GiedriusS · 2021-11-15T09:13:13Z

Could we split this PR then into smaller ones? I could start with a pull request that adds the query engine to Thanos Store with tests. Then, we could add this time-based pushdown logic in future PRs. Finally, we could work on adding that hooking logic into the PromQL engine. Maybe that would be easier?

bwplotka · 2021-11-15T14:27:02Z

But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated.

That is totally fine, no? We need to do that in order to aggregate pre-aggregated results (:

bwplotka · 2021-11-15T14:32:49Z

Anyway, I think we are still not sure if we need a different API or not. That is what blocks us from developing on main IMO. I would keep it on the feature branch still. We can try different approaches and compare. I would move this PR as it is into one Thanos pushdown branch (not your forks's PR), then I can try reusing Series and we can experiment. You see I would love to avoid extra API because of complexities like #4780

We would need to implement all twice and it's not trivial code to fan out all. The SIngle pipeline would be nice, if possible.

GiedriusS · 2021-12-21T09:41:59Z

But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated.

That is totally fine, no? We need to do that in order to aggregate pre-aggregated results (:

It's not fine to do that with functions. For example, if you do sum_over_time, where step < range, and then apply sum_over_time on top then the result will be the double of sum_over_time. :P

stale · 2022-03-02T10:26:14Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

GiedriusS force-pushed the add_timebased_pushdown branch 4 times, most recently from bc82df0 to 6f546a3 Compare September 29, 2021 13:36

yeya24 reviewed Sep 30, 2021

View reviewed changes

GiedriusS force-pushed the add_timebased_pushdown branch 3 times, most recently from 0a10904 to 323d331 Compare September 30, 2021 08:59

GiedriusS force-pushed the add_timebased_pushdown branch from 323d331 to 5cb14f9 Compare September 30, 2021 09:02

GiedriusS marked this pull request as ready for review September 30, 2021 09:03

GiedriusS commented Sep 30, 2021

View reviewed changes

bwplotka reviewed Sep 30, 2021

View reviewed changes

stale bot added the stale label Mar 2, 2022

GiedriusS mentioned this pull request Mar 23, 2022

Implement GRPC query API #5250

Merged

2 tasks

stale bot closed this Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query: add initial time-based query pushdown #4712

query: add initial time-based query pushdown #4712

GiedriusS commented Sep 29, 2021 •

edited

Loading

yeya24 left a comment •

edited

Loading

yeya24 commented Sep 30, 2021 •

edited

Loading

GiedriusS commented Sep 30, 2021

GiedriusS Sep 30, 2021

bwplotka left a comment

bwplotka Sep 30, 2021

bwplotka commented Sep 30, 2021

yeya24 commented Sep 30, 2021

bwplotka commented Sep 30, 2021 •

edited

Loading

GiedriusS commented Oct 1, 2021

yeya24 commented Oct 1, 2021

bwplotka commented Oct 7, 2021

yeya24 commented Nov 14, 2021

GiedriusS commented Nov 15, 2021 •

edited

Loading

bwplotka commented Nov 15, 2021 •

edited

Loading

bwplotka commented Nov 15, 2021

GiedriusS commented Dec 21, 2021

stale bot commented Mar 2, 2022

query: add initial time-based query pushdown #4712

query: add initial time-based query pushdown #4712

Conversation

GiedriusS commented Sep 29, 2021 • edited Loading

yeya24 left a comment • edited Loading

Choose a reason for hiding this comment

yeya24 commented Sep 30, 2021 • edited Loading

GiedriusS commented Sep 30, 2021

GiedriusS Sep 30, 2021

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Sep 30, 2021

Choose a reason for hiding this comment

bwplotka commented Sep 30, 2021

yeya24 commented Sep 30, 2021

bwplotka commented Sep 30, 2021 • edited Loading

GiedriusS commented Oct 1, 2021

yeya24 commented Oct 1, 2021

bwplotka commented Oct 7, 2021

yeya24 commented Nov 14, 2021

GiedriusS commented Nov 15, 2021 • edited Loading

bwplotka commented Nov 15, 2021 • edited Loading

bwplotka commented Nov 15, 2021

GiedriusS commented Dec 21, 2021

stale bot commented Mar 2, 2022

GiedriusS commented Sep 29, 2021 •

edited

Loading

yeya24 left a comment •

edited

Loading

yeya24 commented Sep 30, 2021 •

edited

Loading

bwplotka commented Sep 30, 2021 •

edited

Loading

GiedriusS commented Nov 15, 2021 •

edited

Loading

bwplotka commented Nov 15, 2021 •

edited

Loading