Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrates FAISS iterative builds with NativeEngines990KnnVectorsFormat #1950

Merged

Conversation

shatejas
Copy link
Collaborator

@shatejas shatejas commented Aug 12, 2024

Description

  • The commit merges feature/iterative-index-build branch
  • It integrates with KNNVectorValues and NativeEngines990KnnVectorsFormat
  • There are changes to reuse offheap vector buffer during iterative vector transfer

Related Issues

Resolves #1853

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@shatejas lets fix the CIs first

@shatejas shatejas force-pushed the iterative-index-integration branch from bbc8bb2 to af635b3 Compare August 12, 2024 21:58
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added initial comments, will be reviewing the code further in next few hours.

@shatejas shatejas force-pushed the iterative-index-integration branch 5 times, most recently from fa51043 to fe9c992 Compare August 13, 2024 01:43
Copy link
Contributor

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have covered 4 files , will continue with remaining file tomoroww.

jni/src/faiss_index_service.cpp Show resolved Hide resolved
jni/src/faiss_index_service.cpp Outdated Show resolved Hide resolved
jni/src/faiss_index_service.cpp Outdated Show resolved Hide resolved
jni/src/faiss_index_service.cpp Outdated Show resolved Hide resolved
jni/src/faiss_index_service.cpp Show resolved Hide resolved
src/main/java/org/opensearch/knn/common/KNNVectorUtil.java Outdated Show resolved Hide resolved
@navneet1v
Copy link
Collaborator

Some of the comments can be old as there were updates in the PR. Please ignore comments if they are already answered.

@shatejas shatejas force-pushed the iterative-index-integration branch 2 times, most recently from b317c86 to 2528ea6 Compare August 13, 2024 23:23
@shatejas shatejas force-pushed the iterative-index-integration branch 2 times, most recently from d56d07f to ad2746f Compare August 20, 2024 00:59
@shatejas shatejas force-pushed the iterative-index-integration branch 3 times, most recently from 1536f17 to 452fe05 Compare August 20, 2024 04:06
CHANGELOG.md Outdated
Comment on lines 26 to 28
* Fix graph merge stats size calculation [#1844](https://github.com/opensearch-project/k-NN/pull/1844)
* Integrate Lucene Vector field with native engines to use KNNVectorFormat during segment creation [#1945](https://github.com/opensearch-project/k-NN/pull/1945)
* Disallow a vector field to have an invalid character for a physical file name. [#1936] (https://github.com/opensearch-project/k-NN/pull/1936)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few of these changelogs were moved to separate sections. Please check the latest changelog. Like Integrate Lucene Vector field with native engines to use KNNVectorFormat during segment creation is not a bug fix.

@navneet1v
Copy link
Collaborator

Overall code looks good to me. CIs are failing due to some backward incompatible changes in main branch of Opensearch. That needs to be fixed first and then CIs needs to be run again. Approving this PR assuming CIs will pass. But lets not merge the PR till CIs are passing.

navneet1v
navneet1v previously approved these changes Aug 20, 2024
jmazanec15
jmazanec15 previously approved these changes Aug 20, 2024
Changes include reusing the same vector buffer in the JNI layer

Signed-off-by: Tejas Shah <[email protected]>
Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! CIs are passing now

@ryanbogan ryanbogan merged commit fd59b9a into opensearch-project:main Aug 20, 2024
34 of 35 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1950-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fd59b9adf42b07aa2b2058c4badff6dacf8306a8
# Push it to GitHub
git push --set-upstream origin backport/backport-1950-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1950-to-2.x.

shatejas added a commit to shatejas/k-NN that referenced this pull request Aug 20, 2024
…at (opensearch-project#1950)

* Iterative Vector Insertion (opensearch-project#1840)

* Rebased with new version of k-NN

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized faiss insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized threadCount logic

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed IDEA files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary cmake file

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to new functions

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex and fixed test cases that use it

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused code

Signed-off-by: Andrew Klepchick <[email protected]>

* Explained zero initialization for vector transfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Added locale

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless Apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Account for zero documents in finished batch

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed where we check for zero docs

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed tip for return

Signed-off-by: Andrew Klepchick <[email protected]>

* Use unique pointers to make sure resources are released on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Moved createIndex to testUtils

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management so that the underlying index is not deleted after initialized

Signed-off-by: Andrew Klepchick <[email protected]>

* Created new KNNIndexBuilder graph to make index building more modular

Signed-off-by: Andrew Klepchick <[email protected]>

* Streamlined logic in KNNIndexBuilder.

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up unnecessary code in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management process

Signed-off-by: Andrew Klepchick <[email protected]>

* Added note about index initialization in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for case where the exception happens after the indexWriter is released.

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/modules.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/vcs.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/workspace.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply and free iterative index on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid hack for checking first document metrics

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Free Vector Transfer on batch ingestion

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid free

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed check for transfer ready

Signed-off-by: Andrew Klepchick <[email protected]>

* Don't crash when zero vectors inserted?

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted to old insertion process?

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed prior createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Test remaking vectorTransfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Test restructuring of insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed case where vector address is immediately discarded

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Split Index Builder into multiple classes

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed descriptions of functions in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back copyright files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused builder names

Signed-off-by: Andrew Klepchick <[email protected]>

* Modified tests to work with new insertion methods

Signed-off-by: Andrew Klepchick <[email protected]>

* Track index insertions

Signed-off-by: Andrew Klepchick <[email protected]>

* Tracked insertions for binary indices

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back insertIds

Signed-off-by: Andrew Klepchick <[email protected]>

* Added check for freeVectorData to see if it works with an already deleted address

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up logs and comments in KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Restructured the logic for KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed package name of KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed all package names and deleted unnecessary headers

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed for loop

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex methods for faiss index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed package to fit naming conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name of index builder

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to NativeIndexBuilder and restructured

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion for memoryAddress

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed naming of classes to Writer and changed package name to fit conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed NativeIndexInfo and NativeVectorInfo to follow builder pattern

Signed-off-by: Andrew Klepchick <[email protected]>

* Added feature to changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Added class descriptions to each NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name to getBytesPerVector

Signed-off-by: Andrew Klepchick <[email protected]>

* Added == false instead of ! for readability

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming in docvaluesconsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* SpotlessApply

Signed-off-by: Andrew Klepchick <[email protected]>

* Made it so that we don't reuse testValues and removed a foot gun

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed another foot gun in getIndexInfo

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion on exception cases

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary delete (NativeIndexWriter will handle deletion of vectors on exception)

Signed-off-by: Andrew Klepchick <[email protected]>

* Added correct logger and getWriter method to NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Ensured memory safety on JNI layer so that Java doesn't have to wrap everything in a try catch loop.

Signed-off-by: Andrew Klepchick <[email protected]>

* Refactored NativeIndexWriter and added comments to FaissService

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed free in the JNIExport since index will always be freed in writeIndex.

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed getVectorTransfer back to accept VectorDataType

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted free since not guaranteed to be IDMap.

Signed-off-by: Andrew Klepchick <[email protected]>

* Added all processes in addKNNBinaryField to NativeIndexWriter.createKNNIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Applied spotless

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back writeFooter

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed threadCount fron writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed redundancies in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed serializationMode

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed double free test as we don't have to worry about that anymore

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for HNSWSQ in index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed delete in catch

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed faiss tests to work with writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>

* Index Initialization Alloc Method (opensearch-project#1933)

* Added methods for allocating memory before inserting vectors to a faiss index

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed logic that gets type of index

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statement

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary iostream

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed flat index

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed flat index case

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming

Signed-off-by: Andrew Klepchick <[email protected]>

* Properly allocate HNSWSQ storage

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary lib

Signed-off-by: Andrew Klepchick <[email protected]>

* Made alloc adaptive to different code sizes

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>

* Integrates FAISS iterative builds with NativeEngines990KnnVectorsFormat

Changes include reusing the same vector buffer in the JNI layer

Signed-off-by: Tejas Shah <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>
Signed-off-by: Tejas Shah <[email protected]>
Co-authored-by: Andrew Klepchick <[email protected]>
(cherry picked from commit fd59b9a)
navneet1v pushed a commit that referenced this pull request Aug 22, 2024
…at (#1950) (#1992)

* Iterative Vector Insertion (#1840)

* Rebased with new version of k-NN

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized faiss insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized threadCount logic

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed IDEA files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary cmake file

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to new functions

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex and fixed test cases that use it

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused code

Signed-off-by: Andrew Klepchick <[email protected]>

* Explained zero initialization for vector transfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Added locale

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless Apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Account for zero documents in finished batch

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed where we check for zero docs

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed tip for return

Signed-off-by: Andrew Klepchick <[email protected]>

* Use unique pointers to make sure resources are released on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Moved createIndex to testUtils

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management so that the underlying index is not deleted after initialized

Signed-off-by: Andrew Klepchick <[email protected]>

* Created new KNNIndexBuilder graph to make index building more modular

Signed-off-by: Andrew Klepchick <[email protected]>

* Streamlined logic in KNNIndexBuilder.

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up unnecessary code in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management process

Signed-off-by: Andrew Klepchick <[email protected]>

* Added note about index initialization in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for case where the exception happens after the indexWriter is released.

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/modules.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/vcs.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/workspace.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply and free iterative index on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid hack for checking first document metrics

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Free Vector Transfer on batch ingestion

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid free

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed check for transfer ready

Signed-off-by: Andrew Klepchick <[email protected]>

* Don't crash when zero vectors inserted?

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted to old insertion process?

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed prior createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Test remaking vectorTransfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Test restructuring of insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed case where vector address is immediately discarded

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Split Index Builder into multiple classes

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed descriptions of functions in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back copyright files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused builder names

Signed-off-by: Andrew Klepchick <[email protected]>

* Modified tests to work with new insertion methods

Signed-off-by: Andrew Klepchick <[email protected]>

* Track index insertions

Signed-off-by: Andrew Klepchick <[email protected]>

* Tracked insertions for binary indices

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back insertIds

Signed-off-by: Andrew Klepchick <[email protected]>

* Added check for freeVectorData to see if it works with an already deleted address

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up logs and comments in KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Restructured the logic for KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed package name of KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed all package names and deleted unnecessary headers

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed for loop

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex methods for faiss index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed package to fit naming conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name of index builder

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to NativeIndexBuilder and restructured

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion for memoryAddress

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed naming of classes to Writer and changed package name to fit conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed NativeIndexInfo and NativeVectorInfo to follow builder pattern

Signed-off-by: Andrew Klepchick <[email protected]>

* Added feature to changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Added class descriptions to each NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name to getBytesPerVector

Signed-off-by: Andrew Klepchick <[email protected]>

* Added == false instead of ! for readability

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming in docvaluesconsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* SpotlessApply

Signed-off-by: Andrew Klepchick <[email protected]>

* Made it so that we don't reuse testValues and removed a foot gun

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed another foot gun in getIndexInfo

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion on exception cases

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary delete (NativeIndexWriter will handle deletion of vectors on exception)

Signed-off-by: Andrew Klepchick <[email protected]>

* Added correct logger and getWriter method to NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Ensured memory safety on JNI layer so that Java doesn't have to wrap everything in a try catch loop.

Signed-off-by: Andrew Klepchick <[email protected]>

* Refactored NativeIndexWriter and added comments to FaissService

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed free in the JNIExport since index will always be freed in writeIndex.

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed getVectorTransfer back to accept VectorDataType

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted free since not guaranteed to be IDMap.

Signed-off-by: Andrew Klepchick <[email protected]>

* Added all processes in addKNNBinaryField to NativeIndexWriter.createKNNIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Applied spotless

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back writeFooter

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed threadCount fron writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed redundancies in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed serializationMode

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed double free test as we don't have to worry about that anymore

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for HNSWSQ in index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed delete in catch

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed faiss tests to work with writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>

* Index Initialization Alloc Method (#1933)

* Added methods for allocating memory before inserting vectors to a faiss index

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed logic that gets type of index

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statement

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary iostream

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed flat index

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed flat index case

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming

Signed-off-by: Andrew Klepchick <[email protected]>

* Properly allocate HNSWSQ storage

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary lib

Signed-off-by: Andrew Klepchick <[email protected]>

* Made alloc adaptive to different code sizes

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>

* Integrates FAISS iterative builds with NativeEngines990KnnVectorsFormat

Changes include reusing the same vector buffer in the JNI layer

Signed-off-by: Tejas Shah <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>
Signed-off-by: Tejas Shah <[email protected]>
Co-authored-by: Andrew Klepchick <[email protected]>
(cherry picked from commit fd59b9a)

Signed-off-by: Tejas Shah <[email protected]>
@shatejas shatejas deleted the iterative-index-integration branch August 29, 2024 00:54
akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
…at (opensearch-project#1950)

* Iterative Vector Insertion (opensearch-project#1840)

* Rebased with new version of k-NN

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized faiss insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Optimized threadCount logic

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed IDEA files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary cmake file

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to new functions

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex and fixed test cases that use it

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused code

Signed-off-by: Andrew Klepchick <[email protected]>

* Explained zero initialization for vector transfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Added locale

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless Apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Account for zero documents in finished batch

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed where we check for zero docs

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed tip for return

Signed-off-by: Andrew Klepchick <[email protected]>

* Use unique pointers to make sure resources are released on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Moved createIndex to testUtils

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management so that the underlying index is not deleted after initialized

Signed-off-by: Andrew Klepchick <[email protected]>

* Created new KNNIndexBuilder graph to make index building more modular

Signed-off-by: Andrew Klepchick <[email protected]>

* Streamlined logic in KNNIndexBuilder.

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up unnecessary code in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed memory management process

Signed-off-by: Andrew Klepchick <[email protected]>

* Added note about index initialization in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for case where the exception happens after the indexWriter is released.

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/modules.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/vcs.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Delete jni/src/.idea/workspace.xml

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply and free iterative index on exception

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid hack for checking first document metrics

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Free Vector Transfer on batch ingestion

Signed-off-by: Andrew Klepchick <[email protected]>

* Undid free

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed check for transfer ready

Signed-off-by: Andrew Klepchick <[email protected]>

* Don't crash when zero vectors inserted?

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted to old insertion process?

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed prior createOutput

Signed-off-by: Andrew Klepchick <[email protected]>

* Test remaking vectorTransfer

Signed-off-by: Andrew Klepchick <[email protected]>

* Test restructuring of insertion

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed case where vector address is immediately discarded

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Split Index Builder into multiple classes

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed descriptions of functions in faiss_index_service

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back copyright files

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unused builder names

Signed-off-by: Andrew Klepchick <[email protected]>

* Modified tests to work with new insertion methods

Signed-off-by: Andrew Klepchick <[email protected]>

* Track index insertions

Signed-off-by: Andrew Klepchick <[email protected]>

* Tracked insertions for binary indices

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back insertIds

Signed-off-by: Andrew Klepchick <[email protected]>

* Added check for freeVectorData to see if it works with an already deleted address

Signed-off-by: Andrew Klepchick <[email protected]>

* Cleaned up logs and comments in KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Restructured the logic for KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed package name of KNNIndexBuilder

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed all package names and deleted unnecessary headers

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed for loop

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed createIndex methods for faiss index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed package to fit naming conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name of index builder

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Added comments to NativeIndexBuilder and restructured

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion for memoryAddress

Signed-off-by: Andrew Klepchick <[email protected]>

* Spotless apply

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed naming of classes to Writer and changed package name to fit conventions

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed NativeIndexInfo and NativeVectorInfo to follow builder pattern

Signed-off-by: Andrew Klepchick <[email protected]>

* Added feature to changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Added class descriptions to each NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed name to getBytesPerVector

Signed-off-by: Andrew Klepchick <[email protected]>

* Added == false instead of ! for readability

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming in docvaluesconsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* SpotlessApply

Signed-off-by: Andrew Klepchick <[email protected]>

* Made it so that we don't reuse testValues and removed a foot gun

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed another foot gun in getIndexInfo

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Added deletion on exception cases

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary delete (NativeIndexWriter will handle deletion of vectors on exception)

Signed-off-by: Andrew Klepchick <[email protected]>

* Added correct logger and getWriter method to NativeIndexWriter

Signed-off-by: Andrew Klepchick <[email protected]>

* Ensured memory safety on JNI layer so that Java doesn't have to wrap everything in a try catch loop.

Signed-off-by: Andrew Klepchick <[email protected]>

* Refactored NativeIndexWriter and added comments to FaissService

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed free in the JNIExport since index will always be freed in writeIndex.

Signed-off-by: Andrew Klepchick <[email protected]>

* Changed getVectorTransfer back to accept VectorDataType

Signed-off-by: Andrew Klepchick <[email protected]>

* Reverted free since not guaranteed to be IDMap.

Signed-off-by: Andrew Klepchick <[email protected]>

* Added all processes in addKNNBinaryField to NativeIndexWriter.createKNNIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed javadoc

Signed-off-by: Andrew Klepchick <[email protected]>

* Applied spotless

Signed-off-by: Andrew Klepchick <[email protected]>

* Added back writeFooter

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed threadCount fron writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed redundancies in KNN80DocValuesConsumer

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed serializationMode

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed double free test as we don't have to worry about that anymore

Signed-off-by: Andrew Klepchick <[email protected]>

* Accounted for HNSWSQ in index service

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed delete in catch

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed faiss tests to work with writeIndex

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>

* Index Initialization Alloc Method (opensearch-project#1933)

* Added methods for allocating memory before inserting vectors to a faiss index

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed logic that gets type of index

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statement

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary iostream

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed flat index

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed flat index case

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed naming

Signed-off-by: Andrew Klepchick <[email protected]>

* Properly allocate HNSWSQ storage

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed print statements

Signed-off-by: Andrew Klepchick <[email protected]>

* Fixed changelog

Signed-off-by: Andrew Klepchick <[email protected]>

* Removed unnecessary lib

Signed-off-by: Andrew Klepchick <[email protected]>

* Made alloc adaptive to different code sizes

Signed-off-by: Andrew Klepchick <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>

* Integrates FAISS iterative builds with NativeEngines990KnnVectorsFormat

Changes include reusing the same vector buffer in the JNI layer

Signed-off-by: Tejas Shah <[email protected]>

---------

Signed-off-by: Andrew Klepchick <[email protected]>
Signed-off-by: Tejas Shah <[email protected]>
Co-authored-by: Andrew Klepchick <[email protected]>
Signed-off-by: Akash Shankaran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants