-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: vector search with distance range #3326
Conversation
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3326 +/- ##
==========================================
- Coverage 79.01% 78.97% -0.04%
==========================================
Files 246 246
Lines 87628 88095 +467
Branches 87628 88095 +467
==========================================
+ Hits 69238 69572 +334
- Misses 15523 15631 +108
- Partials 2867 2892 +25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -88,6 +94,8 @@ impl IvfSubIndex for FlatIndex { | |||
dist: OrderedFloat(dist), | |||
}) | |||
.sorted_unstable() | |||
.skip_while(|r| params.lower_bound.map_or(false, |lb| r.dist.0 < lb)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do skip / take before sort?
Also giving most the case, low/high bounds are usually not set. Should we only run those two steps if low/high bounds are set, each of them is a O(n) operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter before sort would evaluate the prediction on all results then it's O(n)
as well.
skip_while
and take_while
is lazy, so they would stop once take
gets enough k
results.
we can optimize this by binary search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two cases:
-
without lower_bound. /upper bound,
sorted_unstable
is O(nlogn), where n = num of rows in each partition. -
with
lower_bound / upper bound
,sorted_unstable
complexity becomesO(n_2 * log(n_2))
wheren_2
is the filtered results?
The N in time complexity might be significantly different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non the less, can we just run some benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: BubbleCal <[email protected]>
Signed-off-by: BubbleCal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. could you make a follow up ticket to expose this in lancedb and make sure we have docs.
|
No description provided.