-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix runtime exceptions in hybrid query for case when sub-query scorer return TwoPhase iterator that is incompatible with DISI iterator #624
Conversation
Signed-off-by: Martin Gaievski <[email protected]>
46775d3
to
d48c06c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #624 +/- ##
============================================
- Coverage 84.44% 82.70% -1.74%
- Complexity 604 650 +46
============================================
Files 48 51 +3
Lines 1826 2053 +227
Branches 276 329 +53
============================================
+ Hits 1542 1698 +156
- Misses 161 212 +51
- Partials 123 143 +20 ☔ View full report in Codecov by Sentry. |
6cd6b77
to
dd27905
Compare
Signed-off-by: Martin Gaievski <[email protected]>
dd27905
to
fe113a6
Compare
@martin-gaievski Could you provide additional context on why rewrite needs to be changed? |
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridScoreBlockBoundaryPropagator.java
Show resolved
Hide resolved
Are we actually using ReqOptSumScorer in this PR? I didnt see it. Seems like we model the new scorer off of it. |
I suspect that this is very similar to opensearch-project/OpenSearch#8155. I'm going to take a closer look at I suspect that Anyway, I will take a look at that after I review this PR. |
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Show resolved
Hide resolved
It's needed in kind of indirect form - because rewrite is not completed for some queries AND we get scorers for all sub-queries via scorerSupplier that leads to some scorers being incorrectly set to |
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/HybridTopScoreDocCollector.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryWeight.java
Outdated
Show resolved
Hide resolved
8e5527b
to
8243511
Compare
Signed-off-by: Martin Gaievski <[email protected]>
8243511
to
67df195
Compare
src/main/java/org/opensearch/neuralsearch/search/query/HybridAggregationProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <[email protected]>
* In most cases it will be wrapped in MultiCollectorManager. | ||
*/ | ||
@RequiredArgsConstructor | ||
public abstract class HybridCollectorManager implements CollectorManager<Collector, ReduceableSearchResult> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to the TwoPhaseIterator fix? Or is it more about allowing hybrid queries to run when concurrent search is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its actually both. but @martin-gaievski can add more here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, that's both. It's needed in scope of two phase iterator because of the query search context, with today's implementation we're creating new collector specific to hybrid query and that effectively drops some logic from QueryPhase.executeInternal related to construction of that context.
src/main/java/org/opensearch/neuralsearch/search/HybridTopScoreDocCollector.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridAggregationProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridAggregationProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <[email protected]>
Thanks @martin-gaievski , the part related to HybridCollectorManager looks good to me, thank you for making the changes! |
src/main/java/org/opensearch/neuralsearch/search/HybridTopScoreDocCollector.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment. Overall code is good.
c9cdcc1
into
opensearch-project:main
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-624-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c9cdcc148cd176becfc1456c9f27ab90aa4bfcf5
# Push it to GitHub
git push --set-upstream origin backport/backport-624-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
… return TwoPhase iterator that is incompatible with DISI iterator (opensearch-project#624) * Adding two phase iterator Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit c9cdcc1)
… return TwoPhase iterator that is incompatible with DISI iterator (#624) (#628) * Adding two phase iterator Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit c9cdcc1)
Description
Adding approximation two phase iterator to hybrid query. That helps in scenarios when sub-queries are complex and direct iteration over all scorers is expensive and may lead to instability, like runtime exceptions.
Example of such query is
bool
withfilter
some complex queries inshould
likedis_max
.System uses ReqOptSumScorer that is based on approximation iterator for scores and two phase iterator for docs. In underlying query clause is complex enough and data set is large avoiding such iterations and using scorers directly may lead to runtime exceptions described in #621
Key changes in this PR:
Issues Resolved
#621
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.