feat: vector search with distance range #3326

BubbleCal · 2025-01-02T08:29:13Z

No description provided.

Signed-off-by: BubbleCal <[email protected]>

codecov-commenter · 2025-01-02T10:50:03Z

Codecov Report

Attention: Patch coverage is 89.53975% with 25 lines in your changes missing coverage. Please review.

Project coverage is 78.97%. Comparing base (2092808) to head (7d3ad66).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance-index/src/vector/flat/index.rs	77.77%	8 Missing ⚠️
rust/lance/src/dataset/scanner.rs	84.00%	0 Missing and 8 partials ⚠️
rust/lance/src/index/vector/pq.rs	81.81%	0 Missing and 6 partials ⚠️
java/core/lance-jni/src/utils.rs	0.00%	2 Missing ⚠️
rust/lance/src/index/vector/ivf/v2.rs	99.12%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3326      +/-   ##
==========================================
- Coverage   79.01%   78.97%   -0.04%     
==========================================
  Files         246      246              
  Lines       87628    88095     +467     
  Branches    87628    88095     +467     
==========================================
+ Hits        69238    69572     +334     
- Misses      15523    15631     +108     
- Partials     2867     2892      +25

Flag	Coverage Δ
unittests	`78.97% <89.53%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eddyxu · 2025-01-02T16:07:21Z

rust/lance-index/src/vector/flat/index.rs

@@ -88,6 +94,8 @@ impl IvfSubIndex for FlatIndex {
                    dist: OrderedFloat(dist),
                })
                .sorted_unstable()
+                .skip_while(|r| params.lower_bound.map_or(false, |lb| r.dist.0 < lb))


should we do skip / take before sort?

Also giving most the case, low/high bounds are usually not set. Should we only run those two steps if low/high bounds are set, each of them is a O(n) operation.

filter before sort would evaluate the prediction on all results then it's O(n) as well.

skip_while and take_while is lazy, so they would stop once take gets enough k results.
we can optimize this by binary search.

There are two cases:

without lower_bound. /upper bound, sorted_unstable is O(nlogn), where n = num of rows in each partition.

with lower_bound / upper bound, sorted_unstable complexity becomes O(n_2 * log(n_2)) where n_2 is the filtered results?

The N in time complexity might be significantly different.

Non the less, can we just run some benchmarks?

Signed-off-by: BubbleCal <[email protected]>

eddyxu

LGTM. could you make a follow up ticket to expose this in lancedb and make sure we have docs.

BubbleCal · 2025-01-03T06:37:03Z

LGTM. could you make a follow up ticket to expose this in lancedb and make sure we have docs.

#3331

feat: vector search with distance range

e6346ab

Signed-off-by: BubbleCal <[email protected]>

github-actions bot added enhancement New feature or request java labels Jan 2, 2025

BubbleCal added 2 commits January 2, 2025 17:11

fix

5029eec

Signed-off-by: BubbleCal <[email protected]>

low recall requirement for 4bit

cfbf153

Signed-off-by: BubbleCal <[email protected]>

BubbleCal marked this pull request as ready for review January 2, 2025 10:58

BubbleCal requested review from eddyxu, westonpace and wjones127 January 2, 2025 10:59

eddyxu reviewed Jan 2, 2025

View reviewed changes

optimize by binary search

e71d61f

Signed-off-by: BubbleCal <[email protected]>

BubbleCal requested a review from eddyxu January 3, 2025 05:21

fix

7d3ad66

Signed-off-by: BubbleCal <[email protected]>

eddyxu approved these changes Jan 3, 2025

View reviewed changes

BubbleCal mentioned this pull request Jan 3, 2025

feat: vector search with distance threshold #3331

Closed

3 tasks

BubbleCal merged commit 39f12dc into lancedb:main Jan 3, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vector search with distance range #3326

feat: vector search with distance range #3326

BubbleCal commented Jan 2, 2025

codecov-commenter commented Jan 2, 2025 •

edited

Loading

eddyxu Jan 2, 2025

BubbleCal Jan 3, 2025

eddyxu Jan 3, 2025

eddyxu Jan 3, 2025

BubbleCal Jan 3, 2025

eddyxu left a comment

BubbleCal commented Jan 3, 2025

feat: vector search with distance range #3326

feat: vector search with distance range #3326

Conversation

BubbleCal commented Jan 2, 2025

codecov-commenter commented Jan 2, 2025 • edited Loading

Codecov Report

eddyxu Jan 2, 2025

Choose a reason for hiding this comment

BubbleCal Jan 3, 2025

Choose a reason for hiding this comment

eddyxu Jan 3, 2025

Choose a reason for hiding this comment

eddyxu Jan 3, 2025

Choose a reason for hiding this comment

BubbleCal Jan 3, 2025

Choose a reason for hiding this comment

eddyxu left a comment

Choose a reason for hiding this comment

BubbleCal commented Jan 3, 2025

codecov-commenter commented Jan 2, 2025 •

edited

Loading