-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in MatchedQueriesPhase #120130
Labels
>bug
:Search Relevance/Search
Catch all for Search Relevance
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Comments
jimczi
added
:Search Relevance/Search
Catch all for Search Relevance
and removed
needs:triage
Requires assignment of a team area label
labels
Jan 14, 2025
elasticsearchmachine
added
the
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
label
Jan 14, 2025
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this issue
Jan 14, 2025
This PR resolves a regression introduced in elastic#94564 by ensuring that the approximation is used when advancing matched query clauses. Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document. This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters. Closes elastic#120130
Thanks for the detailed explanation @Kimahriman , I agree that this is a performance regression. |
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this issue
Jan 14, 2025
This PR resolves a regression introduced in elastic#94564 by ensuring that the approximation is used when advancing matched query clauses. Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document. This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters. Closes elastic#120130
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this issue
Jan 14, 2025
This PR resolves a regression introduced in elastic#94564 by ensuring that the approximation is used when advancing matched query clauses. Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document. This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters. Closes elastic#120130
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this issue
Jan 14, 2025
This PR resolves a regression introduced in elastic#94564 by ensuring that the approximation is used when advancing matched query clauses. Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document. This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters. Closes elastic#120130
elasticsearchmachine
pushed a commit
that referenced
this issue
Jan 14, 2025
This PR resolves a regression introduced in #94564 by ensuring that the approximation is used when advancing matched query clauses. Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document. This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters. Closes #120130
elasticsearchmachine
pushed a commit
that referenced
this issue
Jan 14, 2025
This PR resolves a regression introduced in #94564 by ensuring that the approximation is used when advancing matched query clauses. Utilizing the two-phase iterator to validate matches guarantees that we do not attempt to find the next document fulfilling the two-phase criteria beyond the current document. This fix prevents scenarios where matching a document in the second phase significantly increases query complexity, especially in cases involving restrictive second-pass filters. Closes #120130
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Search Relevance/Search
Catch all for Search Relevance
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Elasticsearch Version
8.17.0
Installed Plugins
No response
Java Version
bundled
OS Version
Public docker image
Problem Description
We started noticing some search tasks hanging around for several hours after we upgraded from 8.6.1 to 8.15.1. They were marked as cancelled but wouldn't get cleaned up for a long time. When inspecting the queries, we noticed they were very heavy geospatial queries, with hundreds to thousands of polygon queries on a
geo_point
field. We assumed this was just complex queries, but couldn't really explain why the cancelled tasks hung around for so long. We recently upgraded to 8.17.0 and still continued to see this behavior, and started noticing more and more of this happening. We finally started digging in more and checked the hot threads on the nodes these tasks were running and saw this part of a stack trace every time this was happening:This led us to do some testing and confirmed that the issue is fully around the fact that we are doing named queries on each of these polygons so we can find out which polygon the point was in. When removing the named queries, this issue goes away.
Looking at changes to the
MatchedQueriesPhase
, I believe this may be caused by #94564.Before this MR, matches were found using
Bits matchingBits = Lucene.asSequentialAccessBits(readerContext.reader().maxDoc(), ss);
, which under the hood operates roughly like:Here,
advance
is always run on the inner iterator in the case of thetwoPhaseIterator
, never on thetwoPhaseIterator
itself.The new code simply does:
in this case,
ScorerSupplier ss
is aIndexOrDocValuesQuery
, which is wrapping anindex
query,LatLonPointQuery
, and adocValues
query,LatLonDocValuesQuery
. When the scorer returned by the scorer supplier ends up being the doc values query instead of the index query due to the cost,scorer.iterator
is aTwoPhaseIteratorAsDocIdSetIterator
. Looking at that implementation:Which looks like after advancing
approximation
(which in this case isLatLonDocValuesQuery
) totarget
, it iterates over every record inapproximation
until it finds the next match, meaning the polygon query will run on every record in the shard (or segment?) even if it's not part of the results being fetched.The original implementation didn't have this issue because it effectively was doing
twoPhaseIterator.approximation.advance(target)
directly instead.Steps to Reproduce
Very hard to reproduce, we had to do hundreds of small polygons that matched very few records to finally recreate it on our production data. I can modify the
MatchedQueriesIT
tests to show that theTwoPhaseIterator
is used when the estimated cost of the index query is greater than 8, which means it estimates at least 8 points match the query on the shard (or segment?).Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: