-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New metric for measuring query duration proposal #4932
Conversation
Signed-off-by: mzardab <[email protected]>
Signed-off-by: mzardab <[email protected]>
Signed-off-by: mzardab <[email protected]>
Signed-off-by: mzardab <[email protected]>
That's a really nice improvement on query latency observability, especially with the new query pushdown feature 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great proposal, but as I mentioned offline, it would be awesome to take this latency number right when we return query PromQL response to the user.
This might have to be changed if we try to push query eval to Query Frontent, but no clear proposal for now so let's go for this!
docs/proposals-accepted/202108-more-granular-query-performance-metrics.md
Show resolved
Hide resolved
docs/proposals-accepted/202108-more-granular-query-performance-metrics.md
Show resolved
Hide resolved
docs/proposals-accepted/202108-more-granular-query-performance-metrics.md
Show resolved
Hide resolved
docs/proposals-accepted/202108-more-granular-query-performance-metrics.md
Show resolved
Hide resolved
docs/proposals-accepted/202108-more-granular-query-performance-metrics.md
Outdated
Show resolved
Hide resolved
* Amend the `seriesServer` to keep track of all `SeriesStats` for each series pushed to it | ||
* Amend the static `qapi.queryableCreate` to take a `SeriesStatsReporter` func parameter that will exfiltrate the seriesStats from the Thanos Proxy StoreAPI | ||
* Add new runtime flags that will allow us to specify a) Query time quantiles b) Series size quantiles c) Sample size quantiles for our partitioned histogram | ||
* Start a query duration timer as soon as the handler is hit | ||
* Create a new partitioned vector histogram called `thanos_query_duration_seconds` in the `queryRange` API handler | ||
* Propagate all exfiltrated `SeriesStats` to aforementioned metric | ||
* Record observations against the `thanos_query_duration_seconds` histogram after bucketing samples_le/series_le buckets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwplotka amended to include the entire query path (with stats exfiltration)
* Do not want to create separate histograms for each individual store query, so they will need to be aggregated at the `Series` request level so that our observations include all | ||
* Do not want to block series receive on a `seriesStats` merging mutex for each incoming response, so maintaining a central `seriesStats` reference and passing it into each of the proxied store requests is out of the question | ||
|
||
### Why can't we capture the query shape & latency spanning the entire query path? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ( :
|
||
**tl;dr:** Longer term to capture the entire query path by amending the Prometheus Querier API to return some stats alongside the query, and creating this generic metric inside the Prometheus PromQL engine. Short term, pass a func parameter to the Queryable constructor for the proxy StoreAPI querier that will exfiltrate the `SeriesStats`, circumventing PromQL engine. | ||
|
||
### Measuring Thanos Query Latency with respect to query fanout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Measuring Thanos Query Latency with respect to query fanout | |
### Measuring Thanos Query Latency with respect to query fanout | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! LGTM.
Next time no need for so many details, but I think it's an amazing exercise for a start 👍🏽
Looks perfect. Only two flakes on CI, so merging, unless there are objections from others 👍🏽 |
Documentation check was not passed so that the following PRs will be failed. @moadz Could you help to fix the Documentation check issue? Thanks. |
Signed-off-by: Moad Zardab [email protected]
Related to #4895