Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] DiskANN search #798

Merged
merged 33 commits into from
May 22, 2023
Merged

[Rust] DiskANN search #798

merged 33 commits into from
May 22, 2023

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Apr 23, 2023

No description provided.

@eddyxu eddyxu force-pushed the lei/search_diskann branch 2 times, most recently from c0ce839 to 28463a5 Compare May 14, 2023 20:54
@eddyxu eddyxu self-assigned this May 15, 2023
@eddyxu eddyxu added the vector Vector Search label May 15, 2023
@eddyxu eddyxu force-pushed the lei/search_diskann branch 4 times, most recently from 595b90f to f6b4ffe Compare May 18, 2023 21:08
@eddyxu eddyxu force-pushed the lei/search_diskann branch from f564ce3 to 2a0b11e Compare May 19, 2023 21:01
@eddyxu eddyxu requested a review from changhiskhan May 19, 2023 21:26
@eddyxu eddyxu marked this pull request as ready for review May 19, 2023 21:27
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have quite a bit to learn still about how our indexes work, but mostly seems reasonable. Had a few questions.

pub fn new(vertices: &[V], data: MatrixView, metric_type: MetricType) -> Self {
Self {
nodes: vertices
.iter()
.map(|v| Node {
vertex: v.clone(),
neighbors: Vec::new(),
neighbors: Arc::new(UInt32Array::from(vec![] as Vec<u32>)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an nontrivial amount of allocations, right? Maybe this should be wrapped in an option instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I can do it. It makes the Graph::neightbour() hard to work tho.

Is there a way that we can make the Graph::neighbors() returns &[f32] slice?

@@ -136,20 +155,37 @@ impl<V: Vertex> PersistedGraph<V> {
return Ok(vertex.clone());
}
}
let prefetch_size = self.params.prefetch_byte_size / self.vertex_size + 1;
let end = std::cmp::min(self.len(), id as usize + prefetch_size);
let end = (id + 1) as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we drop using the prefetch size here? Maybe that's worth a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, want to have a most naive implementation for base line and start from there.

The prefetch it did was only prefetch the row ids, and it causes more random I/Os to read the raw vectors back later before filling into the cache. I have not had the tracing data to prove such prefetch will work yet. Will work on optimization next.

rust/src/index/vector/diskann/builder.rs Outdated Show resolved Hide resolved
Comment on lines 318 to +319
/// Open the Vector index on dataset, specified by the `uuid`.
pub(crate) async fn open_index<'a>(
dataset: &'a Dataset,
pub(crate) async fn open_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the docs be updated for the new index?

Copy link
Contributor Author

@eddyxu eddyxu May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index is not ready for public usage yet. After this issue, we still need some work to profiling and optimizations.

This PR just make it ready for the team to start do benchmark e2e.

@eddyxu eddyxu merged commit e55a095 into main May 22, 2023
@eddyxu eddyxu deleted the lei/search_diskann branch May 22, 2023 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vector Vector Search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants