[Enhancement] Optimize the de-serialization of vector when reading from Doc Values #1050

navneet1v · 2023-08-17T17:57:19Z

Description

In reference to the experiments done in here: #1049, I was found out that when we flip to exact search and use the doc values do the exact search we spend like 60-70% of the time in de-serialization of vectors from binary format to float array. Hence as part of that there was few solutions provided, capturing them as part of this issue. Experiment details:

Experiment 1 (Baseline to Find latency for doing distance calculation on the different size dataset with different dimensions)

Below are the steps that are performed to run this experiment:

For every dimension and every run, generate a dataset with required number of vectors in it. We choose 1k, 2k, 5k, 8k, 10k, 15k vectors count.
After query vector is generated.
Now distance calculation is done on the dataset for every run and every dimension. The time to do distance calculation is stored and this is what we see in the tables below. Lucene distance calculation functions are used to do the distance calculation. code ref
The experiment then collects all the latencies and format them in a table.

dimensions	1000_latency(ms)	2000_latency(ms)	5000_latency(ms)	8000_latency(ms)	10000_latency(ms)	15000_latency(ms)
64	0.0751	0.07755	0.18681	0.33033	0.40572	0.60473
128	0.11913	0.14973	0.38251	0.64307	0.92643	1.17756
256	0.21269	0.30679	0.78219	1.26894	1.8096	2.38399
512	0.38937	0.5992	1.50937	2.5473	3.08621	4.54336
768	0.60041	0.88308	2.35232	3.64441	4.63974	7.03523
968	0.5958	1.11113	2.90708	4.61883	5.92536	8.58447
1024	0.60106	1.20861	3.12053	4.83357	6.24293	9.18919
2048	1.16614	2.45611	6.04847	9.83145	12.46421	18.96537
4096	2.39995	4.86545	11.93488	19.87645	25.46671	40.70765

Experiment 2(Doing de-serialization on vectors and then doing distance calculation)

Below are the steps that are performed to run this experiment:

For every dimension and every run, generate a dataset with required number of vectors in it. We choose 1k, 2k, 5k, 8k, 10k, 15k vectors count. The vectors in this dataset are serialized using the same code we use in K-NN(Code ref).
After query vector is generated. We don’t need to serialize the query as this is not what happens in the exact search.
Now to do the distance calculation we first de-serialized the vectors in the dataset using same code used by k-NN(Code ref) and then performed the distance calculation using Lucene provided functions. This is done for different dimensions and dataset. The time taken in this full step de-serialization and distance calculation is what being provided in below table. Lucene distance calculation functions are used to do the distance calculation. code ref
The experiment then collects all the latencies and format them in a table.

So, the query latency include:

Time taken to de-serialize the vectors to floats.
Distance calculation between the query vector and indexed vector.

dimensions	1000_latency_serialized(ms)	2000_latency_serialized(ms)	5000_latency_serialized(ms)	8000_latency_serialized(ms)	10000_latency_serialized(ms)	15000_latency_serialized(ms)
64	0.5361	0.30428	0.77805	1.2804	1.8098	2.36014
128	0.43123	0.54735	1.24873	1.9306	2.48106	3.5845
256	0.58271	1.06349	2.07025	3.69847	4.90092	8.71385
512	0.79718	1.65931	4.18262	6.67985	8.53127	15.2144
768	1.15486	2.20225	5.68341	9.35401	12.03496	23.44284
968	1.4213	2.85394	7.74272	14.27054	14.97346	33.13775
1024	1.46418	3.15568	7.50078	14.56883	17.00156	29.74787
2048	2.97711	6.85896	16.52853	24.79757	30.17373	78.9689
4096	5.90317	11.74842	31.05969	64.13292	83.25422	122.25665

Combining the Results of Experiment 1 and Experiment 2

dimensions	1000_latency(ms)	1000_latency_serialized(ms)	2000_latency(ms)	2000_latency_serialized(ms)	5000_latency(ms)	5000_latency_serialized(ms)	8000_latency(ms)	8000_latency_serialized(ms)	10000_latency(ms)	10000_latency_serialized(ms)	15000_latency(ms)	15000_latency_serialized(ms)
64	0.0751	0.5361	0.07755	0.30428	0.18681	0.77805	0.33033	1.2804	0.40572	1.8098	0.60473	2.36014
128	0.11913	0.43123	0.14973	0.54735	0.38251	1.24873	0.64307	1.9306	0.92643	2.48106	1.17756	3.5845
256	0.21269	0.58271	0.30679	1.06349	0.78219	2.07025	1.26894	3.69847	1.8096	4.90092	2.38399	8.71385
512	0.38937	0.79718	0.5992	1.65931	1.50937	4.18262	2.5473	6.67985	3.08621	8.53127	4.54336	15.2144
768	0.60041	1.15486	0.88308	2.20225	2.35232	5.68341	3.64441	9.35401	4.63974	12.03496	7.03523	23.44284
968	0.5958	1.4213	1.11113	2.85394	2.90708	7.74272	4.61883	14.27054	5.92536	14.97346	8.58447	33.13775
1024	0.60106	1.46418	1.20861	3.15568	3.12053	7.50078	4.83357	14.56883	6.24293	17.00156	9.18919	29.74787
2048	1.16614	2.97711	2.45611	6.85896	6.04847	16.52853	9.83145	24.79757	12.46421	30.17373	18.96537	78.9689
4096	2.39995	5.90317	4.86545	11.74842	11.93488	31.05969	19.87645	64.13292	25.46671	83.25422	40.70765	122.25665

Time Spent in De-Serialization

--

dimensions	1000_latency_diff(ms)	2000_latency_diff(ms)	5000_latency_diff(ms)	8000_latency_diff(ms)	10000_latency_diff(ms)	15000_latency_diff(ms)
64	0.461	0.22673	0.59125	0.95007	1.40408	1.75541
128	0.3121	0.39762	0.86622	1.28753	1.55463	2.40694
256	0.37002	0.7567	1.28807	2.42953	3.09132	6.32987
512	0.40781	1.06011	2.67325	4.13255	5.44505	10.67103
768	0.55445	1.31917	3.33109	5.7096	7.39523	16.40761
968	0.8255	1.7428	4.83564	9.65171	9.04811	24.55328
1024	0.86312	1.94707	4.38025	9.73526	10.75863	20.55868
2048	1.81097	4.40285	10.48007	14.96612	17.70952	60.00352
4096	3.50322	6.88298	19.12481	44.25647	57.78751	81.549

Experiments Observations

De-serialization is most expensive operation while doing the Search. We can see that with 512 dimensions and 5K vectors we are taking around 2.6 ms for de-serialization which is around 63% of the time(4.18ms taken to compute the distances) and around 69% for lower dimensions like 128.

Time Spent In De-Serialization in %age

--

dimensions	1000_latency_diff	2000_latency_diff	5000_latency_diff	8000_latency_diff	10000_latency_diff	15000_latency_diff
64	85.99	74.51	75.99	74.20	77.58	74.38
128	72.38	72.64	69.37	66.69	62.66	67.15
256	63.50	71.15	62.22	65.69	63.08	72.64
512	51.16	63.89	63.91	61.87	63.82	70.14
768	48.01	59.90	58.61	61.04	61.45	69.99
968	58.08	61.07	62.45	67.63	60.43	74.09
1024	58.95	61.70	58.40	66.82	63.28	69.11
2048	60.83	64.19	63.41	60.35	58.69	75.98
4096	59.34	58.59	61.57	69.01	69.41	66.70

Experiment Conclusion

The above experiments showed us that its the De-Serialization that is taking up most of the time during the exact search.

Possible solutions:

Short Term: To improve the exact search time, we should enable a caching layer for this de-serialization so that we can improve the latencies for exact search. Along with this SIMD instructions set will help reducing the overall time.
Figure out better format to store the vectors. Like can we use .vec file which is used by lucene engine too.

Benefits

This will feature will not only boost the filtered search but will also help in reducing the latency for script score query for k-NN where we read these vectors to perform the exact search.

This will also provide us some gains during the merge operation of segments as we read the binary doc values there too.

jmazanec15 · 2023-08-17T18:07:03Z

Caching could be potentially tricky. I wonder if we should first evaluate lucene's .vec approach and potentially move towards that instead of first implementing caching. In the long run, I think we should convert our codec to KnnVectorFormat anyways.

navneet1v · 2023-08-17T18:10:36Z

@jmazanec15 Yes caching can be tricky. I added some ideas which comes in my mind while thinking about how to reduce the latency. We can very well ignore and move to a better solution.

vamshin · 2023-08-17T19:10:23Z

Agree we should move to KnnVectorFormat in long term and piggy back on the improvements from Lucene. Should we create dedicated GitHub issue to merge the codec to Vector classes used by Lucene?

navneet1v · 2023-08-17T19:27:28Z

@vamshin we can do that. But we need to make sure that we are not loosing focus on what we want to achieve. Plus BWC can be challenge here.

vamshin · 2023-08-18T01:10:34Z

@navneet1v I think merging to KnnVectorFormat is one of the solution to get optimizations. If we think its a long term solution, I dont see having dedicated GitHub issue is a problem. I am fine if we can track it part of the same issue as well.

navneet1v · 2023-08-18T01:52:53Z

@vamshin I am not saying we shouldn’t create a separate issue. I just want to make sure that we are not loosing focus on why we want to merge the format.

Also, merging formats is a big change. Given de-serialization is a long poll should we consider other alternatives too. Also, will having same vector format solves the core problem? What if lucene also has the same issue? I think there are unknowns here hence I first want to make sure first that we have exhausted all the options

jmazanec15 · 2024-03-28T20:06:25Z

@navneet1v @bugmakerrrrrr Id be interested to see if with #1573 the derserialization of vector values is faster or slower than our binarydocvalues

The above experiments i think would be hard to retry with vector values, but Im wondering if we could just run with script scoring instead of with faiss and check the latency diff.

navneet1v · 2024-03-28T20:43:45Z

@navneet1v @bugmakerrrrrr Id be interested to see if with #1573 the derserialization of vector values is faster or slower than our binarydocvalues

The above experiments i think would be hard to retry with vector values, but Im wondering if we could just run with script scoring instead of with faiss and check the latency diff.

+1 we can do that. We have added micro-benchmarks module in k-NN can we leverage that? @jmazanec15 what your thoughts? or we should go with script score query and do this on a cluster?

jmazanec15 · 2024-03-28T20:56:37Z

@navneet1v I think micro-benchmarks would be the right way to see the diff - I started trying this comparison here: https://github.com/jmazanec15/k-NN-1/blob/micro-benchmarks/micro-benchmarks/src/main/java/org/opensearch/knn/VectorSerdeBenchmarks.java, but Im not sure it is quite apples to apples comparison - so Im not pursuing this any further.

I think we would need to try to figure out how to properly setup this class to run the test: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNVectorDVLeafFieldData.java#L24-L28. It might get a little tricky initializing the LeafReader and everything though.

jmazanec15 · 2024-03-28T21:07:32Z

We could potentially re-use logic from the tests: https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/index/KNNVectorScriptDocValuesTests.java#L54

navneet1v · 2024-03-28T21:44:04Z

@jmazanec15 I think we can do something like this: https://github.com/opensearch-project/OpenSearch/blob/main/benchmarks/src/main/java/org/opensearch/benchmark/index/mapper/CustomBinaryDocValuesFieldBenchmark.java#L72-L78 to make the docValues for micro benchmarks.

jmazanec15 · 2024-03-28T23:05:02Z

@navneet1v ah nice that looks good

jmazanec15 · 2024-04-10T21:08:18Z

@navneet1v additionally, I think we should be able to see if we can integrate into the IndexFieldDataCache for caching purposes and benchmark this. It seems index field data can possibly be auto-cached: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/fielddata/IndexFieldData.java#L85-L88.

Ill come up with a set of benchmarks for this if you havent already started.

navneet1v · 2024-04-10T21:30:31Z

FieldDataCache in my understanding was an old concept of Lucene which is used before DocValues came in.

You can do the benchmarks but as per my knowledge its not used now.

I would say lets come up with another Cache layer where we can cache the vectors serialized from DocValues.

jmazanec15 · 2024-04-16T21:23:55Z

I decided to start from running some end to end benchmarks to get an initial sense of the perf difference between the two vector representations. I created a repo for running some of these experiments from docker: https://github.com/jmazanec15/opensearch-knn-rescore-experiments. The repo is very basic right now. But I ran some preliminary tests and the initial results suggest that the KnnVectorsFormat may perform better for script scoring:

10000 vectors, 1000 queries, k=100, dim=1024, space=l2
nmslib 1: 35.33668613433838 seconds to run queries
nmslib 2: 35.46980571746826 seconds to run queries
faiss 1: 35.574172258377075 seconds to run queries
faiss 2: 35.60338592529297 seconds to run queries
lucene 1: 20.01723551750183 seconds to run queries
lucene 2: 20.044759511947632 seconds to run queries

(ref)

Will iterate on these experiments and have some higher fidelity results in the next week or two.

navneet1v · 2024-04-26T18:18:31Z

@jmazanec15 thanks for sharing the initial results.

can we capture the flame graphs for these runs?

navneet1v added untriaged enhancement and removed untriaged labels Aug 17, 2023

navneet1v assigned vamshin Aug 17, 2023

github-actions bot added the untriaged label Aug 17, 2023

navneet1v mentioned this issue Aug 17, 2023

[Feature] Enhancing Efficient Filters(in Faiss Engine) for Restrictive Filtering use case #1049

Closed

vamshin added backlog and removed untriaged labels Aug 17, 2023

navneet1v removed the backlog label Aug 17, 2023

vamshin added the backlog label Aug 17, 2023

vamshin added this to Vector Search RoadMap Aug 22, 2023

github-project-automation bot moved this to Backlog in Vector Search RoadMap Aug 22, 2023

navneet1v mentioned this issue Sep 22, 2023

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

Closed

12 tasks

jmazanec15 mentioned this issue Mar 19, 2024

Investigate migrating custom codec from BinaryDocValuesFormat to KnnVectorsFormat #1087

Closed

jmazanec15 self-assigned this Apr 16, 2024

jmazanec15 unassigned vamshin Apr 16, 2024

jmazanec15 mentioned this issue May 20, 2024

[META] Improve performance for exact search with script scoring #1709

Open

4 tasks

jmazanec15 closed this as completed Aug 14, 2024

github-project-automation bot moved this from Backlog to ✅ Done in Vector Search RoadMap Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Optimize the de-serialization of vector when reading from Doc Values #1050

[Enhancement] Optimize the de-serialization of vector when reading from Doc Values #1050

navneet1v commented Aug 17, 2023 •

edited

Loading

jmazanec15 commented Aug 17, 2023

navneet1v commented Aug 17, 2023

vamshin commented Aug 17, 2023

navneet1v commented Aug 17, 2023

vamshin commented Aug 18, 2023 •

edited

Loading

navneet1v commented Aug 18, 2023

jmazanec15 commented Mar 28, 2024

navneet1v commented Mar 28, 2024 •

edited

Loading

jmazanec15 commented Mar 28, 2024

jmazanec15 commented Mar 28, 2024

navneet1v commented Mar 28, 2024

jmazanec15 commented Mar 28, 2024

jmazanec15 commented Apr 10, 2024

navneet1v commented Apr 10, 2024

jmazanec15 commented Apr 16, 2024

navneet1v commented Apr 26, 2024

[Enhancement] Optimize the de-serialization of vector when reading from Doc Values #1050

[Enhancement] Optimize the de-serialization of vector when reading from Doc Values #1050

Comments

navneet1v commented Aug 17, 2023 • edited Loading

Description

Experiment 1 (Baseline to Find latency for doing distance calculation on the different size dataset with different dimensions)

Experiment 2(Doing de-serialization on vectors and then doing distance calculation)

Combining the Results of Experiment 1 and Experiment 2

Time Spent in De-Serialization

Experiments Observations

Experiment Conclusion

Possible solutions:

Benefits

jmazanec15 commented Aug 17, 2023

navneet1v commented Aug 17, 2023

vamshin commented Aug 17, 2023

navneet1v commented Aug 17, 2023

vamshin commented Aug 18, 2023 • edited Loading

navneet1v commented Aug 18, 2023

jmazanec15 commented Mar 28, 2024

navneet1v commented Mar 28, 2024 • edited Loading

jmazanec15 commented Mar 28, 2024

jmazanec15 commented Mar 28, 2024

navneet1v commented Mar 28, 2024

jmazanec15 commented Mar 28, 2024

jmazanec15 commented Apr 10, 2024

navneet1v commented Apr 10, 2024

jmazanec15 commented Apr 16, 2024

navneet1v commented Apr 26, 2024

navneet1v commented Aug 17, 2023 •

edited

Loading

vamshin commented Aug 18, 2023 •

edited

Loading

navneet1v commented Mar 28, 2024 •

edited

Loading