New metric for measuring query duration proposal #4932

moadz · 2021-12-08T11:50:24Z

Signed-off-by: Moad Zardab [email protected]
Related to #4895

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Signed-off-by: mzardab <[email protected]>

yeya24 · 2021-12-11T22:06:32Z

That's a really nice improvement on query latency observability, especially with the new query pushdown feature 👍

bwplotka

Great proposal, but as I mentioned offline, it would be awesome to take this latency number right when we return query PromQL response to the user.

This might have to be changed if we try to push query eval to Query Frontent, but no clear proposal for now so let's go for this!

docs/proposals-accepted/202108-more-granular-query-performance-metrics.md

moadz · 2021-12-23T15:11:56Z

docs/proposals-accepted/202108-more-granular-query-performance-metrics.md

+* Amend the `seriesServer` to keep track of all `SeriesStats` for each series pushed to it
+* Amend the static `qapi.queryableCreate` to take a `SeriesStatsReporter` func parameter that will exfiltrate the seriesStats from the Thanos Proxy StoreAPI
+* Add new runtime flags that will allow us to specify a) Query time quantiles b) Series size quantiles c) Sample size quantiles for our partitioned histogram
+* Start a query duration timer as soon as the handler is hit
+* Create a new partitioned vector histogram called `thanos_query_duration_seconds` in the `queryRange` API handler 
+* Propagate all exfiltrated `SeriesStats` to aforementioned metric
+* Record observations against the `thanos_query_duration_seconds` histogram after bucketing samples_le/series_le buckets 
+


@bwplotka amended to include the entire query path (with stats exfiltration)

bwplotka · 2021-12-23T15:47:46Z

docs/proposals-accepted/202108-more-granular-query-performance-metrics.md

+* Do not want to create separate histograms for each individual store query, so they will need to be aggregated at the `Series` request level so that our observations include all
+* Do not want to block series receive on a `seriesStats` merging mutex for each incoming response, so maintaining a central `seriesStats` reference and passing it into each of the proxied store requests is out of the question
+
+### Why can't we capture the query shape & latency spanning the entire query path? 


bwplotka · 2021-12-23T15:48:03Z

docs/proposals-accepted/202108-more-granular-query-performance-metrics.md

+
+**tl;dr:** Longer term to capture the entire query path by amending the Prometheus Querier API to return some stats alongside the query, and creating this generic metric inside the Prometheus PromQL engine. Short term, pass a func parameter to the Queryable constructor for the proxy StoreAPI querier that will exfiltrate the `SeriesStats`, circumventing PromQL engine.  
+
+### Measuring Thanos Query Latency with respect to query fanout 


Suggested change

### Measuring Thanos Query Latency with respect to query fanout

### Measuring Thanos Query Latency with respect to query fanout

bwplotka

Great work! LGTM.

Next time no need for so many details, but I think it's an amazing exercise for a start 👍🏽

bwplotka · 2021-12-23T16:27:28Z

Looks perfect. Only two flakes on CI, so merging, unless there are objections from others 👍🏽

clyang82 · 2021-12-24T03:00:14Z

Documentation check was not passed so that the following PRs will be failed. @moadz Could you help to fix the Documentation check issue? Thanks.

moadz · 2022-01-04T10:44:14Z

@clyang82 apologies, just saw that this was merged. Thanks @hanjm for rectifying in b828d00

moadz added 3 commits December 8, 2021 11:41

Adding 202108-more-granular-query-performance-metrics proposal

946eae6

Signed-off-by: mzardab <[email protected]>

Adding 202108-more-granular-query-performance-metrics proposal

bc0dd57

Signed-off-by: mzardab <[email protected]>

corrections

264dccb

Signed-off-by: mzardab <[email protected]>

moadz mentioned this pull request Dec 8, 2021

More granular query performance metrics for Thanos store #4895

Open

corrections

37b47f0

Signed-off-by: mzardab <[email protected]>

bwplotka reviewed Dec 14, 2021

View reviewed changes

Ammending proposal to include why we can't capture the entire query path

ba8dda4

moadz changed the title ~~New metric for measuring series fanout duration proposal~~ New metric for measuring proxy StoreAPI select duration proposal Dec 22, 2021

pull-request-size bot added the size/L label Dec 22, 2021

Linting

7757854

moadz commented Dec 22, 2021

View reviewed changes

docs/proposals-accepted/202108-more-granular-query-performance-metrics.md Outdated Show resolved Hide resolved

Amend doc to include SeriesStats reporter

4bd8138

moadz changed the title ~~New metric for measuring proxy StoreAPI select duration proposal~~ New metric for measuring query duration proposal Dec 23, 2021

moadz commented Dec 23, 2021

View reviewed changes

bwplotka reviewed Dec 23, 2021

View reviewed changes

Renaming section

92240d3

bwplotka approved these changes Dec 23, 2021

View reviewed changes

bwplotka enabled auto-merge (squash) December 23, 2021 15:51

bwplotka disabled auto-merge December 23, 2021 16:27

bwplotka merged commit cb44778 into thanos-io:main Dec 23, 2021

moadz mentioned this pull request Jan 31, 2022

Proposal: report sample and series statistics to help analyze and monitor query complexity prometheus/prometheus#10181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New metric for measuring query duration proposal #4932

New metric for measuring query duration proposal #4932

moadz commented Dec 8, 2021 •

edited

Loading

yeya24 commented Dec 11, 2021

bwplotka left a comment

moadz Dec 23, 2021

bwplotka Dec 23, 2021

bwplotka Dec 23, 2021

bwplotka left a comment

bwplotka commented Dec 23, 2021

clyang82 commented Dec 24, 2021

moadz commented Jan 4, 2022


		tl;dr: Longer term to capture the entire query path by amending the Prometheus Querier API to return some stats alongside the query, and creating this generic metric inside the Prometheus PromQL engine. Short term, pass a func parameter to the Queryable constructor for the proxy StoreAPI querier that will exfiltrate the `SeriesStats`, circumventing PromQL engine.

		### Measuring Thanos Query Latency with respect to query fanout

New metric for measuring query duration proposal #4932

New metric for measuring query duration proposal #4932

Conversation

moadz commented Dec 8, 2021 • edited Loading

yeya24 commented Dec 11, 2021

bwplotka left a comment

Choose a reason for hiding this comment

moadz Dec 23, 2021

Choose a reason for hiding this comment

bwplotka Dec 23, 2021

Choose a reason for hiding this comment

bwplotka Dec 23, 2021

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka commented Dec 23, 2021

clyang82 commented Dec 24, 2021

moadz commented Jan 4, 2022

moadz commented Dec 8, 2021 •

edited

Loading