Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HNSW changes to support Faiss byte vector #1823

Merged

Conversation

naveentatikonda
Copy link
Member

@naveentatikonda naveentatikonda commented Jul 12, 2024

Description

Add HNSW changes to support Faiss byte vector which behind the scenes uses Faiss SQ8_direct_signed scalar quantizer.

Issues Resolved

#1659

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@naveentatikonda naveentatikonda self-assigned this Jul 12, 2024
@naveentatikonda naveentatikonda added Features Introduces a new unit of functionality that satisfies a requirement v2.16.0 labels Jul 12, 2024
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch 3 times, most recently from 5d532b0 to 374b81c Compare July 12, 2024 22:42
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch from 4391fa2 to 66ed657 Compare July 13, 2024 16:50
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch 4 times, most recently from 68db524 to 1b344be Compare July 17, 2024 03:40
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch 2 times, most recently from 0e173b1 to 66ec044 Compare July 29, 2024 17:44
@naveentatikonda naveentatikonda changed the base branch from feature/binary-format to main July 29, 2024 17:45
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch 2 times, most recently from 5e0bea8 to fa9af8d Compare August 4, 2024 18:52
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch 3 times, most recently from b500228 to 1533a91 Compare August 21, 2024 19:52
Copy link
Collaborator

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from the integration stand point

jni/src/commons.cpp Show resolved Hide resolved
jni/src/faiss_index_service.cpp Show resolved Hide resolved
jni/src/faiss_index_service.cpp Outdated Show resolved Hide resolved
// If VectorDataType is Byte using Faiss engine then manipulate Index Description to use "SQ8_direct_signed" scalar quantizer
// For example, Index Description "HNSW16,Flat" will be updated as "HNSW16,SQ8_direct_signed"
String indexDescription = methodAsMapBuilder.indexDescription;
if (indexDescription != null && indexDescription.isEmpty() == false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use StringUtils.isEmpty() from apache commons.

Also what will happen in the case indexDescription is null or empty. I think we should throw an exception rather than just silently not doing anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up - as part of the mode/compression support, Im going to adjust the default logic to provide kinda of a default provider as opposed to just a default. So we wont need this additional logic once thats changed, because we will be able to supply default quantizer based on data type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use StringUtils for now. But, I'm not throwing an exception for the else case as Jack mentioned that we will be refactoring it.

append
);
}
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return the vector address? Just thinking in append scenario

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we don't need to check if list is empty or not because we are calling this if list size is equal to transfer limit or during flush (where we are verifying if list is not empty)

jni/include/commons.h Show resolved Hide resolved
jni/include/commons.h Show resolved Hide resolved
jni/src/faiss_index_service.cpp Show resolved Hide resolved
jni/src/faiss_index_service.cpp Show resolved Hide resolved
jni/src/faiss_index_service.cpp Outdated Show resolved Hide resolved
@naveentatikonda naveentatikonda force-pushed the faiss-byte-vector-hnsw branch 2 times, most recently from 08eca5d to ec1570d Compare August 22, 2024 23:08
Signed-off-by: Naveen Tatikonda <[email protected]>
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@naveentatikonda naveentatikonda merged commit 59c312b into opensearch-project:main Aug 23, 2024
32 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 23, 2024
* Add HNSW changes to support Faiss byte vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 59c312b)
naveentatikonda added a commit that referenced this pull request Aug 26, 2024
* Add HNSW changes to support Faiss byte vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit 59c312b)

Co-authored-by: Naveen Tatikonda <[email protected]>
akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
* Add HNSW changes to support Faiss byte vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Akash Shankaran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Features Introduces a new unit of functionality that satisfies a requirement v2.17.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants