Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: add initial time-based query pushdown #4712

Closed
wants to merge 1 commit into from

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Sep 29, 2021

Add time-based pushdown mechanism. It works by directly querying a
Thanos Store node if it is the only one partially or fully matching the
time range of a query. I believe that this will be the case most of the
times because typically Sidecar/Ruler/Receive cover a few days of time
range and horizontally balanced Thanos Store instances cover the
historical data from remote object storage. With query-frontend in front
splitting the queries into smaller parts means that with higher time ranges, most of
the queries can be simply pushed down. For example, with Sidecars
covering 2 days of data, with a 7d query, and a split interval of 1d
this means that 5/7 * 100 = ~71% of processing can be pushed down to the
leaf nodes.

Small ad-hoc tests which touch a lot of timeseries show a 2x reduction
in query duration.

Hide this under an "experimental feature" flag
because this still doesn't have everything implemented intentionally to have a smaller change:

  • lazySeriesSet for concurrent Select()s in Thanos Store
  • Missing some of the Prometheus engine flags
  • Support the selection of StoreAPI nodes in the UI (use only them, not
    all endpoints)
  • No chunks/series limiters

Also, this API could be potentially implemented in other components.

If the approach looks good then we can merge this and implement these in follow-up PRs.

The main parts of this change is in:

  • New QueryAPI service
  • New pkg/pushdown/
  • API request handling code

@GiedriusS GiedriusS force-pushed the add_timebased_pushdown branch 4 times, most recently from bc82df0 to 6f546a3 Compare September 29, 2021 13:36
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this work! Can you add another E2E test case for query and store gateway with and without query pushdown? Is the returned result the same for range query?

@yeya24
Copy link
Contributor

yeya24 commented Sep 30, 2021

We are filtering stores using the given time range in the API. But when promql engine performs the Select call, will it use the same time range given from the API?

https://github.com/prometheus/prometheus/blob/f103acd5135b8bbe885b17a73dafc7bbb586319c/promql/engine.go#L762
From this function, IIUC the time range will be adjusted based on the promql query itself. WDYT?

If we can extract this logic from promql, then maybe it is possible to pushdown instant queries as well.

@GiedriusS GiedriusS force-pushed the add_timebased_pushdown branch 3 times, most recently from 0a10904 to 323d331 Compare September 30, 2021 08:59
Add time-based pushdown mechanism. It works by directly querying a
Thanos Store node if it is the only one partially or fully matching the
time range of a query. I believe that this will be the case most of the
times because typically Sidecar/Ruler/Receive cover a few days of time
range and horizontally balanced Thanos Store instances cover the
historical data from remote object storage. With query-frontend in front
splitting the queries into smaller parts means that with higher time ranges, most of
the queries can be simply pushed down. For example, with Sidecars
covering 2 days of data, with a 7d query, and a split interval of 1d
this means that 5/7 * 100 = ~71% of processing can be pushed down to the
leaf nodes.

Small ad-hoc tests which touch a lot of timeseries show a 2x reduction
in query duration.

Hide this under an "experimental feature" flag
because this still doesn't have everything implemented intentionally:

* `lazySeriesSet` for concurrent `Select()`s in Thanos Store
* Some missing Prometheus engine flags such as the look-back delta
* Support the selection of StoreAPI nodes in the UI (use only them, not
all endpoints)
* No chunks/series limiters

Also, this API could be potentially implemented in other components.

If the approach looks good then we can merge this and implement these in follow-up PRs.

The main parts of this change is in:

* New QueryAPI service
* New `pkg/pushdown/`
* API request handling code

Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS GiedriusS force-pushed the add_timebased_pushdown branch from 323d331 to 5cb14f9 Compare September 30, 2021 09:02
@GiedriusS
Copy link
Member Author

We are filtering stores using the given time range in the API. But when promql engine performs the Select call, will it use the same time range given from the API?

https://github.com/prometheus/prometheus/blob/f103acd5135b8bbe885b17a73dafc7bbb586319c/promql/engine.go#L762 From this function, IIUC the time range will be adjusted based on the promql query itself. WDYT?

If we can extract this logic from promql, then maybe it is possible to pushdown instant queries as well.

That's true but I'd like to start simple and cover only one case at the beginning as in this PR. Maybe you could add this idea to #305?

@GiedriusS GiedriusS marked this pull request as ready for review September 30, 2021 09:03
@@ -146,6 +147,7 @@ func (es *grpcEndpointSpec) fillExpectedAPIs(componentType component.Component,
MinTime: mintime,
MaxTime: maxTime,
},
Query: &infopb.QueryInfo{},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that without --endpoint it means that we assume that all Stores implement QueryAPI :s

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think the main question we have to ask ourselves - do we need another API.

The thing is that StoreAPI is well equipped for the work - the pipeline is ready for doing global eval, local eval or mixed. Let's go through different executions for query count(metric1{target=ext1}). This makes PromQL engine selects in Querier metric1{target=ext1} (and with count function in hints). Then Querier proxy code (StoreAPI querier implementation) fanouts to all matching stores. And let's say for simple case. There is only ONE matching store. And no duplicated metrics. We can then:

  1. Ask this matching store for non-pushdown StoreAPI call. This will return all selected metric1 series from remote store. We pass that as it is to PromQL engine in StoreSet. Then PromQL counts them up and returns result.

  2. But in the same time, we can do a special StoreAPI.Series call with e.g pushdown=true field that will tell the underlying store to give us count of all metric1 series. So it will return us single series
    for count(metric1{target=ext1}) as long as we know there are no duplicates across fanouts, we can put that as a result to PromQL engine. PromQL engine will take that as only series count it (but it's only one, so fine) and return to response.

We can enhance 2 and even pass step so only necessary samples are returned that would be use by PromQL.

I know it might be complex, so maybe a second method to service makes sense, but I would really really like to avoid another API - too much complexity IMO 🤔

option (gogoproto.goproto_unrecognized_all) = false;
option (gogoproto.goproto_sizecache_all) = false;

service Query {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think the main question we have to ask ourselves - do we need another API, let me explain on main comment.

@bwplotka
Copy link
Member

If we would just expose QuerAPI through gRPC - this might work, but only for super limited cases, where is only one Store matched. We block (or need separate path) for cases where we have count, sum etc function and multiple stores.. I think this is bit of missed opportunity - and we sacrifice a huge complexity if we would have now: QueryAPI, StoreAPI and StorePushdownAPI... etc

@yeya24
Copy link
Contributor

yeya24 commented Sep 30, 2021

But in the same time, we can do a special StoreAPI.Series call with e.g pushdown=true field that will tell the underlying store to give us count of all metric1 series. So it will return us single series
for count(metric1{target=ext1}) as long as we know there are no duplicates across fanouts, we can put that as a result to PromQL engine. PromQL engine will take that as only series count it (but it's only one, so fine) and return to response.

PromQL engine only gets series set as result and it will try to apply count again. How to bypass this and tell it the result was already calculated? I feel if we don't have another API other than Select we cannot achieve query pushdown.

@bwplotka
Copy link
Member

bwplotka commented Sep 30, 2021

What else do you want than series result as input to PromQL? (: Global PromQL can take raw series but also pre-calculated series from another PromQL engine, in same way, no? (:

Both are just series with some samples, nothing else.

@GiedriusS
Copy link
Member Author

OK, I think the main question we have to ask ourselves - do we need another API.

...
I know it might be complex, so maybe a second method to service makes sense, but I would really really like to avoid another API - too much complexity IMO thinking

I understand you and remember this idea from your design document but are these two things exclusive to each other? I really think that we can have both. Just like most of the databases do optimizations at many layers, I think we should do both (and much more). The optimization in this PR is dead simple and quite trivial - the only complex part IMO is implementing the PromQL engine.

To be honest, there is probably no clear-cut answer. But this pull request would solve already many problems in at least two big deployments of Thanos that I know of. Implementing pushdown has been the longest time on our backlog but it has never been properly implemented - perhaps that is because the idea in its entirety is quite complex and we must divide it into smaller parts as in this PR?

Also, I understand that we should strive to implement as generic solutions as possible to ultimately reduce the complexity so that it would be easier to understand the code and maintain it but also on the other hand IMHO sometimes optimization come in smaller pieces, and two extra function calls in queryRange() doesn't seem like a huge overhead at all.

@yeya24 what's your opinion on this topic?

@yeya24
Copy link
Contributor

yeya24 commented Oct 1, 2021

Also, I understand that we should strive to implement as generic solutions as possible to ultimately reduce the complexity so that it would be easier to understand the code and maintain it but also on the other hand IMHO sometimes optimization come in smaller pieces, and two extra function calls in queryRange() doesn't seem like a huge overhead at all.
@yeya24 what's your opinion on this topic?

They can coexist and I think it is okay to have a separate API for pushdown for now as it is under an experimental feature flag. In the short term, it is hard to hack the promql engine to achieve pushdown directly.

What we need to do is to ensure the results consistency between pushdown and normal query. Continuous benchmarking or tests would help so that we can compare the performance.

@bwplotka
Copy link
Member

bwplotka commented Oct 7, 2021

Right, thanks for this. I don't want to block innovation here, especially since it's not me who is spending time on this right now. I wished we could sit down and brainstorm this together in the following weeks. I can agree reusing the same API might be complex work to achieve. With a separate API, we might have better velocity.

I am still a bit opposed to adding an entire new API that has all those limitations at the start. We already have a complex codebase, experiments like this will not help with maintaining the existing work we are doing. I wonder if it's worth maintaining the feature branch for this work together?

@yeya24
Copy link
Contributor

yeya24 commented Nov 14, 2021

Both are just series with some samples, nothing else.

Right, I understand the final results are just some samples. But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated.
I think that's why we need a new API to bypass the series post-processing from the engine. If it is possible to have a hook at the engine side then that would be great.

@GiedriusS
Copy link
Member Author

GiedriusS commented Nov 15, 2021

Could we split this PR then into smaller ones? I could start with a pull request that adds the query engine to Thanos Store with tests. Then, we could add this time-based pushdown logic in future PRs. Finally, we could work on adding that hooking logic into the PromQL engine. Maybe that would be easier?

@bwplotka
Copy link
Member

bwplotka commented Nov 15, 2021

But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated.

That is totally fine, no? We need to do that in order to aggregate pre-aggregated results (:

@bwplotka
Copy link
Member

Anyway, I think we are still not sure if we need a different API or not. That is what blocks us from developing on main IMO. I would keep it on the feature branch still. We can try different approaches and compare. I would move this PR as it is into one Thanos pushdown branch (not your forks's PR), then I can try reusing Series and we can experiment. You see I would love to avoid extra API because of complexities like #4780

We would need to implement all twice and it's not trivial code to fan out all. The SIngle pipeline would be nice, if possible.

@GiedriusS
Copy link
Member Author

But the promql engine will apply the promql functions, aggregators again even if some series are pre-aggregated.

That is totally fine, no? We need to do that in order to aggregate pre-aggregated results (:

It's not fine to do that with functions. For example, if you do sum_over_time, where step < range, and then apply sum_over_time on top then the result will be the double of sum_over_time. :P

@stale
Copy link

stale bot commented Mar 2, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Mar 2, 2022
@GiedriusS GiedriusS mentioned this pull request Mar 23, 2022
2 tasks
@stale stale bot closed this Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants