Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move aKNN limits enforcement into the default Codec's KnnVectorsFormat implementation #12309

Closed
mikemccand opened this issue May 18, 2023 · 1 comment · Fixed by #12436 or #12466
Closed

Comments

@mikemccand
Copy link
Member

Description

[Spinoff from #12306]

There have been many discussions and polls about what to do about the existing (weakly enforced) limit of aKNN vector dimensionality in Lucene.

This issue represents Option 3 in @alessandrobenedetti's recent poll thread.

Since it is this Codec component (currently Lucene95HnswVectorsFormat) that is implementing the HNSW approach for approximate KNN, it makes sense that it should be the one to enforce any limits (dimensionality, max connections, beam width, etc.). In fact, it already seems to enforce some limits -- I see MAXIMUM_BEAM_WIDTH = 3200 and MAXIMUM_MAX_CON = 512. Once we do this, users can still fork their own Codec to change limits, or implement a different aKNN algorithm, etc., and it will be clear that they are no longer using Lucene's default Codec so index format backwards compatibility is no longer ensured.

Version and environment details

No response

@bruno-roustant
Copy link
Contributor

In the same work, or in a separate work, we could create the extension of the HNSW implementation in the codecs package to provide it to users, so they don't have to have their own fork for that. This alternative codec would just support more dimensions to play with, without backwards compatibility constraints.

mayya-sharipova added a commit to mayya-sharipova/lucene that referenced this issue Jul 13, 2023
Move vector max dimension limits enforcement into the default Codec's
KnnVectorsFormat implementation. This allows different implementation
of knn search algorithms define their own limits of a maximum
vector dimenstions that they can handle.

Closes apache#12309
mayya-sharipova added a commit that referenced this issue Jul 27, 2023
Move vector max dimension limits enforcement into the default Codec's
KnnVectorsFormat implementation. This allows different implementation
of knn search algorithms define their own limits of a maximum
vector dimensions that they can handle.

Closes #12309
mayya-sharipova added a commit that referenced this issue Jul 27, 2023
Move vector max dimension limits enforcement into the default Codec's
KnnVectorsFormat implementation. This allows different implementation
of knn search algorithms define their own limits of a maximum
vector dimensions that they can handle.

Closes #12309
mayya-sharipova added a commit to mayya-sharipova/lucene that referenced this issue Jul 27, 2023
- Backward codecs use 1024 as max dims
- Test classes use the current KnnVectorsFormat#DEFAULT_MAX_DIMENSIONS

Relates to PR#12436
Closes apache#12309
mayya-sharipova added a commit that referenced this issue Jul 28, 2023
- Backward codecs use 1024 as max dims
- Test classes use the current KnnVectorsFormat#DEFAULT_MAX_DIMENSIONS

Relates to PR#12436
Closes #12309
@zhaih zhaih added this to the 9.8.0 milestone Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants