-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve metric query performance #95776
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Have you considered to move a way from the default (unset)
This means we can probably have a heuristic which refreshes an index every There might be some exceptions to the rule (like there Kubernetes Events metricset where you'd want much higher refresh intervals for troubleshooting purposes), but for many other observability usecases such a heuristic might work well. What do you think? |
… rewrite without a SearchExecutionContext. With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available. The `TermQueryBuilder` can with this resolve to a `MatchAllQueryBuilder` with needing a `SearchExecutionContext`, which during the can_match phase means that no searcher needs to be acquired and therefor avoiding making a shard search active / potentially refresh. The `AbstractQueryBuilder#doRewrite(...)` method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This was forgotten as part of elastic#96161 and is needed to complete elastic#95776.
…t a SearchExecutionContext. (#96905) With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available. The TermQueryBuilder can with this change resolve to a MatchNoneQueryBuilder without needing a SearchExecutionContext, which during the can_match phase means that no searcher needs to be acquired and therefor avoid making a shard search active and doing a potentially refresh. The AbstractQueryBuilder#doRewrite(...) method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This is inline with what was discussed here: #96161 (comment) This change was forgotten as part of #96161 and is needed to complete #95776.
I expermented with this via #101619, but it didn't yield the performance gains I was hoping for. In some cases the performance was worse. |
There are a number of performance issues that have been found in production cluster for metric solutions that need to be addressed in order to have competitive query latency in the metric space. This is part of the tsdb effort as it aim is to make Elasticsearch better at storing and querying metric data. Tasks mentioned here are improvements that significantly reduce query time of many metric query workloads or specific ones.
Our current observations indicate that the poor performance is caused by the default refresh behaviour. Shards by default go search-idle after 30 seconds of search inactivity. When a shard is queries that is search idle then a refresh is performed as part of the search and then search execution continues. This adds a significant amount of latency to the query time. Especially because the refresh isn't triggered, but awaits until the scheduled refresh kicks in (which means often for 1 second nothing happens).
Additionally we observed that any search with a
percentile
aggregation is slow. Under the hood thepercentile
aggregation uses avl t-digest to compute the percentiles. This shows up as significant hotspot when profiling.percentile
aggregation by switching to the merging based t-digest implementation. The current avl based implementation performs slowly in production with metric data set of any reasonable size. This work consists out of forking the t-digest library (Fork tdigest library #95903)) and then change the implementation to merging t-digest (Feature/replace avl digest with merging digest #35182).cardinality
aggregation performance on low cardinality fields (Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields. #92060).map
orglobal_ordinals
should be used.The text was updated successfully, but these errors were encountered: