[Rust] DiskANN search #798

eddyxu · 2023-04-23T20:30:43Z

No description provided.

wjones127

I have quite a bit to learn still about how our indexes work, but mostly seems reasonable. Had a few questions.

wjones127 · 2023-05-19T22:00:27Z

rust/src/index/vector/graph/builder.rs

    pub fn new(vertices: &[V], data: MatrixView, metric_type: MetricType) -> Self {
        Self {
            nodes: vertices
                .iter()
                .map(|v| Node {
                    vertex: v.clone(),
-                    neighbors: Vec::new(),
+                    neighbors: Arc::new(UInt32Array::from(vec![] as Vec<u32>)),


This seems like an nontrivial amount of allocations, right? Maybe this should be wrapped in an option instead?

Yea, I can do it. It makes the Graph::neightbour() hard to work tho.

Is there a way that we can make the Graph::neighbors() returns &[f32] slice?

wjones127 · 2023-05-19T22:46:54Z

rust/src/index/vector/graph/persisted.rs

@@ -136,20 +155,37 @@ impl<V: Vertex> PersistedGraph<V> {
                return Ok(vertex.clone());
            }
        }
-        let prefetch_size = self.params.prefetch_byte_size / self.vertex_size + 1;
-        let end = std::cmp::min(self.len(), id as usize + prefetch_size);
+        let end = (id + 1) as usize;


Why did we drop using the prefetch size here? Maybe that's worth a comment?

So here, want to have a most naive implementation for base line and start from there.

The prefetch it did was only prefetch the row ids, and it causes more random I/Os to read the raw vectors back later before filling into the cache. I have not had the tracing data to prove such prefetch will work yet. Will work on optimization next.

rust/src/index/vector/diskann/builder.rs

wjones127 · 2023-05-19T23:09:29Z

rust/src/index/vector.rs

 /// Open the Vector index on dataset, specified by the `uuid`.
-pub(crate) async fn open_index<'a>(
-    dataset: &'a Dataset,
+pub(crate) async fn open_index(


Should the docs be updated for the new index?

This index is not ready for public usage yet. After this issue, we still need some work to profiling and optimizations.

This PR just make it ready for the team to start do benchmark e2e.

Co-authored-by: Will Jones <[email protected]>

eddyxu force-pushed the lei/search_diskann branch 2 times, most recently from c0ce839 to 28463a5 Compare May 14, 2023 20:54

eddyxu self-assigned this May 15, 2023

eddyxu added the vector Vector Search label May 15, 2023

eddyxu force-pushed the lei/search_diskann branch 4 times, most recently from 595b90f to f6b4ffe Compare May 18, 2023 21:08

eddyxu and others added 22 commits May 19, 2023 14:00

DiskANNIndex as VectorIndex interface

3dc2197

compilable

dc28e7c

compilable

84b5f9c

persist

97d7a9b

make it compile

8bb3728

buildable

75ae727

build return

259ca2a

neighbour w.o lock

37a25bd

cargo fmt

2598d78

fix some compiling

eaa91ec

fix some builds

2f46808

pass all tests

c5efd9f

cargo fmt

3a0b02c

add some profiling

13ee0d6

more profiling

fa8a4f0

profiling

d01b636

fix visit semantic in graph

b72fa75

minor

f182ed2

print out progresss

719b500

open diskann index

69eb129

cargo fmt

68d398d

load on demand

53605c8

eddyxu added 6 commits May 19, 2023 14:01

add debug

5405c2c

pass dataset as Arc

6dce9da

pass dataset and load vectors on demand

3f07468

buildable

bbf82d6

change

292b40b

fix tests

2a0b11e

eddyxu force-pushed the lei/search_diskann branch from f564ce3 to 2a0b11e Compare May 19, 2023 21:01

clean up prints

3690f2c

eddyxu requested a review from changhiskhan May 19, 2023 21:26

revert metrics

15d4ca6

eddyxu marked this pull request as ready for review May 19, 2023 21:27

wjones127 requested changes May 19, 2023

View reviewed changes

eddyxu and others added 3 commits May 19, 2023 16:22

Update rust/src/index/vector/diskann/builder.rs

2a62d89

Co-authored-by: Will Jones <[email protected]>

Merge branch 'main' into lei/search_diskann

73c9e5d

Merge branch 'main' into lei/search_diskann

1797eaf

wjones127 approved these changes May 22, 2023

View reviewed changes

eddyxu merged commit e55a095 into main May 22, 2023

eddyxu deleted the lei/search_diskann branch May 22, 2023 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rust] DiskANN search #798

[Rust] DiskANN search #798

eddyxu commented Apr 23, 2023

wjones127 left a comment

wjones127 May 19, 2023

eddyxu May 19, 2023

wjones127 May 19, 2023

eddyxu May 19, 2023

wjones127 May 19, 2023

eddyxu May 19, 2023 •

edited

Loading

[Rust] DiskANN search #798

[Rust] DiskANN search #798

Conversation

eddyxu commented Apr 23, 2023

wjones127 left a comment

Choose a reason for hiding this comment

wjones127 May 19, 2023

Choose a reason for hiding this comment

eddyxu May 19, 2023

Choose a reason for hiding this comment

wjones127 May 19, 2023

Choose a reason for hiding this comment

eddyxu May 19, 2023

Choose a reason for hiding this comment

wjones127 May 19, 2023

Choose a reason for hiding this comment

eddyxu May 19, 2023 • edited Loading

Choose a reason for hiding this comment

eddyxu May 19, 2023 •

edited

Loading