-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming PQ #689
Streaming PQ #689
Conversation
train opq simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how much has changed, could you re-run the sift benchmarks just to make sure the latency / recall profile looks roughly the same still?
@@ -130,6 +131,15 @@ impl MatrixView { | |||
} | |||
} | |||
|
|||
pub fn row(&self, i: usize) -> Option<Float32Array> { | |||
if i >= self.num_rows() { | |||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an error?
rust/src/dataset.rs
Outdated
let ivf_params = IvfBuildParams { | ||
num_partitions: vec_params.num_partitions as usize, | ||
metric_type: vec_params.metric_type, | ||
max_iters: 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these values from FAISS?
}; | ||
let pq_params = PQBuildParams { | ||
num_sub_vectors: vec_params.num_sub_vectors as usize, | ||
num_bits: 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to change much to support configurable num_bits? is it just exposing a new API parameter? or is the underlying index creation hard coded to 8 bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_nbits
is not configurable via user-facing API yet. 8 bits is just 1 byte. other than 8 bits
, the other configuration needs some work .
mat: &MatrixView, | ||
metric_type: MetricType, | ||
) -> Result<FixedSizeListArray> { | ||
self.train(mat, metric_type, 50).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move hard coded constants into one place so we know what needs to change in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yea, lemme just make this a parameter passed from user-facing api
let transforms = stream::iter(index_metadata.transforms) | ||
.map(|tf| async move { | ||
Ok::<Arc<dyn Transformer>, Error>(Arc::new( | ||
OptimizedProductQuantizer::load( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought this doesn't use OPQ yet?
Running on
|
Train IVF_PQ on sample dataset, and use streaming to compute PQ code without loading full dataset in memory.