Add new int8 scalar quantization to HNSW codec #12582

benwtrent · 2023-09-21T18:56:14Z

As with most codec changes, this is an eye popping number of LoC and the design isn't finished yet.

Initial benchmarking (utilizing non-normalized cohere embeddings + max-inner product, which is a particularly difficult case for naive quantization), I get 10-20% faster search and ~4x smaller storage used for the search (I am keeping the raw vectors around...we can debate if we want to do that).

Recall@10 with 100 fanout = 0.804
Recall@100 with 200 fanout = 0.9.

I am reaching the point where the design needs to be finalized and I wanted to reachout for feedback.

Some design discussion points that I am unsure about are:

Do we want to have a new "flat" vector codec that HNSW (or other complicated vector indexing methods), can use? Detractor here is that now HNSW codec relies on another pluggable thing that is a "flat" vector index (just provides mechanisms for reading, writing, merging vectors in a flat index).
Should "quantization" just be a thing that is provided to vector codecs? The main detractor here is future scalar quantization could easily be added (like int4 or even binary).
Should the "quantizer" keep the raw vectors around itself? Or rely on some external party to provide them (in this case, I an relying on the HNSW codec)?

This PR involves the refactoring of the HNSW builder and searcher, aiming to create an abstraction for the random access and vector comparisons conducted during graph traversal. The newly added RandomVectorScorer provides a means to directly compare ordinals, eliminating the need to expose the raw vector primitive type. This scorer takes charge of vector retrieval and comparison during the graph's construction and search processes. The primary purpose of this abstraction is to enable the implementation of various strategies. For example, it opens the door to constructing the graph using the original float vectors while performing searches using their quantized int8 vector counterparts.

…quantization

…ature/add-scalar-quantization

…quantization

benwtrent · 2023-10-18T16:34:01Z

@uschindler I switch it around. Lucene95Codec is back, moved the previous HNSW format into backwards_codec and switched Lucene95Codec to use the Lucene99Hnsw format.

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java

...ne/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java

lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java

mayya-sharipova

@benwtrent Amazing work!!! Thanks Ben, Jim and Tom for such a great contribution!
I left some small comments, but otherwise this PR LGTM (I am still fuzzy about all the math in the quantizer, so I trust you and @tveasey that you got it right).

…quantization

Adds new int8 scalar quantization for HNSW codec. This uses a new lucene9.9 format and auto quantizes floating point vectors into bytes on flush and merge.

This is needed following apache/lucene#12685 and apache/lucene#12582

follow up to #12582 For user convenience, I added back the two parameter ctor for the HNSW codec.

naveentatikonda · 2024-02-23T23:28:00Z

I get 10-20% faster search and ~4x smaller storage used for the search (I am keeping the raw vectors around...we can debate if we want to do that)

@benwtrent Based on your comments, does 4x smaller storage means optimization in terms of disk space or the memory savings(RAM). Can you please confirm and how did you run these benchmarkings (the tool used) and how did you conclude that we are saving x% of memory savings. The reason being when I tried to integrate lucene with my service I was not able to see the same amount of memory savings during search.

benwtrent · 2024-02-24T00:05:09Z

@naveentatikonda , I used lucene-util.

It's an off-heap memory requirement reduction. Instead of having to load float32 values in memory, it will load byte values. HNSW does many random reads, this read behavior really needs to be in memory for it to be fast.

It uses about 25% more disk as the raw float32 vectors are kept along with the quantized ones.

navneet1v · 2024-02-24T00:30:47Z

@naveentatikonda , I used lucene-util.

It's an off-heap memory requirement reduction. Instead of having to load float32 values in memory, it will load byte values. HNSW does many random reads, this read behavior really needs to be in memory for it to be fast.

It uses about 25% more disk as the raw float32 vectors are kept along with the quantized ones.

can you share some code reference and dataset you used for testing? It will be very helpful.

benwtrent · 2024-02-24T00:59:05Z

https://github.com/mikemccand/luceneutil

It can be a little frustrating at first to set up, but it's nice for quick integration and performance tests.

The python script knnPerfTest.py and I downloaded a bunch of CohereV2 parquet files from huggingface.

CohereV3 is released now and they have many GB of vectors available for testing on huggingface. So, you could use those instead.

Love those folks over at Cohere :)

navneet1v · 2024-02-24T01:31:07Z

Thanks @benwtrent will look into this.

benwtrent and others added 30 commits August 24, 2023 10:08

AMEND ME INITIAL QUANTIZATION CODE

5364a74

SQUASH AND REMOVE

02988fe

CLEAN

4c189ad

address review

f35bc17

add javadoc

f9ac5cf

require non null scorer provider

5a7ac3c

restore import

487fc5c

tidy

ded4578

deep copy of the provided query array

e4ea757

reuse query instead of copying the value

514b56c

rename param to avoid confusion

b33fa9b

tidy

eb8eddf

iter

fa524f7

address review comments

926cecb

remove unused variable

50c6678

tidy

39d69b7

Merge branch 'main' into graph_similarity_scorer

d058980

iter

f254f31

reverting all unnecessary changes

75e630f

Merge remote-tracking branch 'upstream/main' into feature/add-scalar-…

ce84af8

…quantization

apply review comments

31d8b54

fix format

4258bea

squash

a5669d0

iter

ca9fb50

iter

c6037cf

Merge remote-tracking branch 'upstream/main' into feature/add-scalar-…

810143f

…quantization

Merge remote-tracking branch 'jimczi/graph_similarity_scorer' into fe…

7c9c5e2

…ature/add-scalar-quantization

adding new lucene98 codec

a03dd6a

Merge remote-tracking branch 'upstream/main' into feature/add-scalar-…

92cbf46

…quantization

benwtrent requested a review from mayya-sharipova October 18, 2023 16:33