-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of multi vector in jni #1364
Add support of multi vector in jni #1364
Conversation
e88ce8a
to
f743e66
Compare
@heemin32 Can you give detailed summary on what is being changed in PR description? |
jni/src/knn_extension/faiss/MultiVectorResultCollectorFactory.cpp
Outdated
Show resolved
Hide resolved
d6b2116
to
7a00cab
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## feature/multi-vector #1364 +/- ##
=======================================================
Coverage 85.10% 85.10%
Complexity 1251 1251
=======================================================
Files 162 162
Lines 5101 5101
Branches 477 477
=======================================================
Hits 4341 4341
Misses 554 554
Partials 206 206 ☔ View full report in Codecov by Sentry. |
jni/include/knn_extension/faiss/MultiVectorResultCollectorFactory.h
Outdated
Show resolved
Hide resolved
uint64_t word = this->bitmap[i] >> (index & 63); // Equivalent of bitmap[i] >> (index % 64) | ||
|
||
if (word != 0) { | ||
return index + __builtin_ctzll(word); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment around usage of this? What does word represent? It looks like its the 64-bit int that is like the "bit page" shifted over offset bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the comment in Heap.h file and let me know if you still need some comment here to understand the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which comment in Heap.h you are referring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean
* Here a block is 64 bit. However, for simplicity let's assume its size is 8 bits.
* Then, if have an array of 3, 7, and 10, it will be represented in bitmap as follow.
* [0] [1]
* bitmap: 10001000 00000100
*
* for next_set_bit call with 4
* 1. it looks for bitmap[0]
* 2. bitmap[0] >> 4
* 3. count trailing zero of the result from step 2 which is 3
* 4. return 4(current index) + 3(result from step 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats not in Heap.h, but makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. BitSet.h :)
Signed-off-by: Heemin Kim <[email protected]>
uint64_t word = this->bitmap[i] >> (index & 63); // Equivalent of bitmap[i] >> (index % 64) | ||
|
||
if (word != 0) { | ||
return index + __builtin_ctzll(word); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which comment in Heap.h you are referring
df6d1fa
into
opensearch-project:feature/multi-vector
Signed-off-by: Heemin Kim <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
* Add patch to support multi vector in faiss (#1358) Signed-off-by: Heemin Kim <[email protected]> * Initialize id_map as null (#1363) Signed-off-by: Heemin Kim <[email protected]> * Add support of multi vector in jni (#1364) Signed-off-by: Heemin Kim <[email protected]> * Multi vector support for Faiss HNSW (#1371) Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields. Signed-off-by: Heemin Kim <[email protected]> * Add data generation script for nested field (#1388) Signed-off-by: Heemin Kim <[email protected]> * Add perf test for nested field (#1394) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
* Add patch to support multi vector in faiss (opensearch-project#1358) Signed-off-by: Heemin Kim <[email protected]> * Initialize id_map as null (opensearch-project#1363) Signed-off-by: Heemin Kim <[email protected]> * Add support of multi vector in jni (opensearch-project#1364) Signed-off-by: Heemin Kim <[email protected]> * Multi vector support for Faiss HNSW (opensearch-project#1371) Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields. Signed-off-by: Heemin Kim <[email protected]> * Add data generation script for nested field (opensearch-project#1388) Signed-off-by: Heemin Kim <[email protected]> * Add perf test for nested field (opensearch-project#1394) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]> (cherry picked from commit 709b448)
* Add patch to support multi vector in faiss (opensearch-project#1358) Signed-off-by: Heemin Kim <[email protected]> * Initialize id_map as null (opensearch-project#1363) Signed-off-by: Heemin Kim <[email protected]> * Add support of multi vector in jni (opensearch-project#1364) Signed-off-by: Heemin Kim <[email protected]> * Multi vector support for Faiss HNSW (opensearch-project#1371) Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields. Signed-off-by: Heemin Kim <[email protected]> * Add data generation script for nested field (opensearch-project#1388) Signed-off-by: Heemin Kim <[email protected]> * Add perf test for nested field (opensearch-project#1394) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]> (cherry picked from commit 709b448)
* Add patch to support multi vector in faiss (#1358) Signed-off-by: Heemin Kim <[email protected]> * Initialize id_map as null (#1363) Signed-off-by: Heemin Kim <[email protected]> * Add support of multi vector in jni (#1364) Signed-off-by: Heemin Kim <[email protected]> * Multi vector support for Faiss HNSW (#1371) Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields. Signed-off-by: Heemin Kim <[email protected]> * Add data generation script for nested field (#1388) Signed-off-by: Heemin Kim <[email protected]> * Add perf test for nested field (#1394) Signed-off-by: Heemin Kim <[email protected]> --------- Signed-off-by: Heemin Kim <[email protected]> (cherry picked from commit 709b448)
Description
With this PR, QueryIndex and QueryIndex_WithFilter JNI method can take a list of parentId to dedupe search result per parent. To do that, custom result collector is implemented which will be passed as HNSW search parameter to faiss library.
Issues Resolved
N/A
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.