Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Reciprocal Rank Fusion (RRF) in hybrid query #1086

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented Jan 9, 2025

Description

Merges the feature branch for Reciprocal Rank Fusion (RRF) now that we have App Sec sign-off

Contains changes from the following PR's:

Related Issues

#659

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • [] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Johnsonisaacn and others added 4 commits December 17, 2024 15:49
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <[email protected]>

---------
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <[email protected]>
@yuye-aws
Copy link
Member

@ryanbogan Can you resolve the conflicting files?

@yuye-aws
Copy link
Member

@martin-gaievski I remember there was a PR towards the feature branch. Do you know how to compare these two PRs?

@martin-gaievski
Copy link
Member

@martin-gaievski I remember there was a PR towards the feature branch. Do you know how to compare these two PRs?

while feature is in development we merge every PR to feature branch. Once we're code complete and other things like app sec are done we merge everything from feature branch to main. This PR is exactly that, @ryanbogan has listed all previously merged to feature branch PRs in the description

bzhangam and others added 8 commits January 10, 2025 11:34
* add impl

Signed-off-by: zhichao-aws <[email protected]>

* add UT

Signed-off-by: zhichao-aws <[email protected]>

* rename pruneType; UT

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* ut

Signed-off-by: zhichao-aws <[email protected]>

* add it

Signed-off-by: zhichao-aws <[email protected]>

* change on 2-phase

Signed-off-by: zhichao-aws <[email protected]>

* UT

Signed-off-by: zhichao-aws <[email protected]>

* it

Signed-off-by: zhichao-aws <[email protected]>

* rename

Signed-off-by: zhichao-aws <[email protected]>

* enhance: more detailed error message

Signed-off-by: zhichao-aws <[email protected]>

* refactor to prune and split

Signed-off-by: zhichao-aws <[email protected]>

* changelog

Signed-off-by: zhichao-aws <[email protected]>

* fix UT cov

Signed-off-by: zhichao-aws <[email protected]>

* address review comments

Signed-off-by: zhichao-aws <[email protected]>

* enlarge score diff range

Signed-off-by: zhichao-aws <[email protected]>

* address comments: check lowScores non null instead of flag

Signed-off-by: zhichao-aws <[email protected]>

---------

Signed-off-by: zhichao-aws <[email protected]>
* Allow empty string for field in field map

Signed-off-by: Yizhe Liu <[email protected]>

* Allow empty string when validation

Signed-off-by: Yizhe Liu <[email protected]>

* Add to change log

Signed-off-by: Yizhe Liu <[email protected]>

* Update CHANGELOG to: Support empty string for fields in text embedding processor

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
…es (#1043)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
junqiu-lei and others added 13 commits January 10, 2025 11:34
* add support for builder constructor in neural query builder

Signed-off-by: will-hwang <[email protected]>

* create custom builder class to enforce valid neural query builder instantiation

Signed-off-by: will-hwang <[email protected]>

* refactor code to remove duplicate

Signed-off-by: will-hwang <[email protected]>

* include new constructor in qa packages

Signed-off-by: will-hwang <[email protected]>

* refactor code to remove unnecessary code

Signed-off-by: will-hwang <[email protected]>

* fix bug in neural query builder instantiation

Signed-off-by: will-hwang <[email protected]>

---------

Signed-off-by: will-hwang <[email protected]>
* add hybrid search with rescore IT

Signed-off-by: will-hwang <[email protected]>

* remove rescore in hybrid search IT

Signed-off-by: will-hwang <[email protected]>

* remove previous version checks in build file

Signed-off-by: will-hwang <[email protected]>

* removing version checks only in rolling upgrade tests

Signed-off-by: will-hwang <[email protected]>

* remove newly added tests in restart test

Signed-off-by: will-hwang <[email protected]>

* Revert "remove newly added tests in restart test"

This reverts commit 0987831.

Signed-off-by: will-hwang <[email protected]>

---------

Signed-off-by: will-hwang <[email protected]>
…t has dot in field name (#1062)

* Fix bug where document embedding fails to be generated due to document has dot in field name

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
* Add reindex integration tests

Signed-off-by: Andy Qin <“[email protected]”>
* Fix github CI by adding eclipse dependency in formatting.gradle

Signed-off-by: Varun Jain <[email protected]>

* Add changelog

Signed-off-by: Varun Jain <[email protected]>

---------

Signed-off-by: Varun Jain <[email protected]>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <[email protected]>

Co-authored-by: Varun Jain <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <[email protected]>

---------
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.