Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new int8 scalar quantization to HNSW codec #12582

Merged
merged 63 commits into from
Oct 24, 2023

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Sep 21, 2023

As with most codec changes, this is an eye popping number of LoC and the design isn't finished yet.

Initial benchmarking (utilizing non-normalized cohere embeddings + max-inner product, which is a particularly difficult case for naive quantization), I get 10-20% faster search and ~4x smaller storage used for the search (I am keeping the raw vectors around...we can debate if we want to do that).

Recall@10 with 100 fanout = 0.804
Recall@100 with 200 fanout = 0.9.

I am reaching the point where the design needs to be finalized and I wanted to reachout for feedback.

Some design discussion points that I am unsure about are:

  • Do we want to have a new "flat" vector codec that HNSW (or other complicated vector indexing methods), can use? Detractor here is that now HNSW codec relies on another pluggable thing that is a "flat" vector index (just provides mechanisms for reading, writing, merging vectors in a flat index).
  • Should "quantization" just be a thing that is provided to vector codecs? The main detractor here is future scalar quantization could easily be added (like int4 or even binary).
  • Should the "quantizer" keep the raw vectors around itself? Or rely on some external party to provide them (in this case, I an relying on the HNSW codec)?

benwtrent and others added 30 commits August 24, 2023 10:08
This PR involves the refactoring of the HNSW builder and searcher, aiming to create an abstraction for the random access and vector comparisons conducted during graph traversal.

The newly added RandomVectorScorer provides a means to directly compare ordinals, eliminating the need to expose the raw vector primitive type.
This scorer takes charge of vector retrieval and comparison during the graph's construction and search processes.

The primary purpose of this abstraction is to enable the implementation of various strategies.
For example, it opens the door to constructing the graph using the original float vectors while performing searches using their quantized int8 vector counterparts.
@benwtrent
Copy link
Member Author

@uschindler I switch it around. Lucene95Codec is back, moved the previous HNSW format into backwards_codec and switched Lucene95Codec to use the Lucene99Hnsw format.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwtrent Amazing work!!! Thanks Ben, Jim and Tom for such a great contribution!
I left some small comments, but otherwise this PR LGTM (I am still fuzzy about all the math in the quantizer, so I trust you and @tveasey that you got it right).

@benwtrent benwtrent merged commit f2bf533 into apache:main Oct 24, 2023
@benwtrent benwtrent deleted the feature/add-scalar-quantization branch October 24, 2023 18:31
benwtrent added a commit that referenced this pull request Oct 24, 2023
Adds new int8 scalar quantization for HNSW codec. This uses a new lucene9.9 format and auto quantizes floating point vectors into bytes on flush and merge.
javanna added a commit to elastic/elasticsearch that referenced this pull request Oct 26, 2023
benwtrent added a commit that referenced this pull request Oct 27, 2023
benwtrent added a commit that referenced this pull request Oct 27, 2023
benwtrent added a commit that referenced this pull request Oct 30, 2023
follow up to #12582

For user convenience, I added back the two parameter ctor for the HNSW codec.
benwtrent added a commit that referenced this pull request Oct 30, 2023
follow up to #12582

For user convenience, I added back the two parameter ctor for the HNSW codec.
@naveentatikonda
Copy link

I get 10-20% faster search and ~4x smaller storage used for the search (I am keeping the raw vectors around...we can debate if we want to do that)

@benwtrent Based on your comments, does 4x smaller storage means optimization in terms of disk space or the memory savings(RAM). Can you please confirm and how did you run these benchmarkings (the tool used) and how did you conclude that we are saving x% of memory savings. The reason being when I tried to integrate lucene with my service I was not able to see the same amount of memory savings during search.

@benwtrent
Copy link
Member Author

@naveentatikonda , I used lucene-util.

It's an off-heap memory requirement reduction. Instead of having to load float32 values in memory, it will load byte values. HNSW does many random reads, this read behavior really needs to be in memory for it to be fast.

It uses about 25% more disk as the raw float32 vectors are kept along with the quantized ones.

@navneet1v
Copy link
Contributor

@naveentatikonda , I used lucene-util.

It's an off-heap memory requirement reduction. Instead of having to load float32 values in memory, it will load byte values. HNSW does many random reads, this read behavior really needs to be in memory for it to be fast.

It uses about 25% more disk as the raw float32 vectors are kept along with the quantized ones.

can you share some code reference and dataset you used for testing? It will be very helpful.

@benwtrent
Copy link
Member Author

https://github.com/mikemccand/luceneutil

It can be a little frustrating at first to set up, but it's nice for quick integration and performance tests.

The python script knnPerfTest.py and I downloaded a bunch of CohereV2 parquet files from huggingface.

CohereV3 is released now and they have many GB of vectors available for testing on huggingface. So, you could use those instead.

Love those folks over at Cohere :)

@navneet1v
Copy link
Contributor

Thanks @benwtrent will look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.