Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetch phase to search profile #1764

Open
andrross opened this issue Dec 17, 2021 · 3 comments
Open

Add fetch phase to search profile #1764

andrross opened this issue Dec 17, 2021 · 3 comments
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@andrross
Copy link
Member

Is your feature request related to a problem? Please describe.
A significant performance regression exists for some use cases in the fetch phase between ES 7.9 and OS 1.0. The root cause was a change to the Lucene codec and it has been mitigated by a change within Lucene that is present in OS 1.2. See issue #1647 for more details. I was able to profile the JVM using Java Flight Recorder and the decompression during the fetch phase stood out as an obvious change, but this would have been much easier if the search profile results had contained timing metrics on the fetch phase.

Describe the solution you'd like
The "profile" section of the query response should contain information about the fetch phase.

Describe alternatives you've considered
Profling the JVM can give a lot of insight into where time is being spent, but is a rather complicated process and requires a lot of knowledge of the Java development ecosystem.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented May 30, 2024

Some other aspects of improving the profile output in general -

  1. Add support for individual functions in FunctionScoreQuery. Currently we don't have breakdown on what happens in individual functions and given a function is comprised of a regular query and score manipulation logic, it can sometimes takes a significantly longer time and we are blind if there are multiple such functions to identify the bottleneck.
  2. It would be nice if we can also get segment level breakdown in profile output if that's something feasible.
  3. Integration of resource tracking into profile output when enabled [Query Insights] Capture query-level resource usage metrics #12399. We need to think more on how it can be integrated, profile output breaks it down to individual clauses of the query to the lowest level query in lucene. Given we don't have that granular information from resource query resource tracking, it might not be possible to integrate it down the tree to lucene queries but we can definitely add information at shard level breakdown. cc @ansjcy @getsaurabh02

@ansjcy
Copy link
Member

ansjcy commented Jun 3, 2024

I agree with @rishabhmaurya 's suggestions! Currently as part of the efforts to capture query-level resource usage metrics and the coordinator node level took time, we are able to get phase-level breakdown on latency and resource usages, but we can go even deeper and get insights on the resource usages / time consumptions for some critical functions.

@rishabhmaurya
Copy link
Contributor

Additionally, profile output can capture further shard level metrics related to computation of field data cache. This could be helpful for cases when there are too many evictions due to lower heap available for field data caches and user may want to increase heap size to improve performance of their search queries in these scenarios. We don't have a way to flag such queries today and without flame graphs it becomes hard to understand what's taking time in these queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
Status: Todo
Development

No branches or pull requests

4 participants