You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On top of this sortPairs would also allocate temporary buffers, with size O(len)
Here len is the size (number of vectors) in the whole dataset. When we are indexing DEEP-1B, then len=1e9, and the temporary space becomes prohibitive.
Steps/Code to reproduce bug
Run DEEP-1B test with IVF-PQ, with subsample ratio 10. It will run OOM.
Expected behavior
Subsample vectors with minimal memory overhead.
The text was updated successfully, but these errors were encountered:
The random sampling of IVF methods was reverted (#2144) due to large memory utilization #2141.
This PR improves the memory consumption of subsamling: it is O(n_train) where n_train is the size of the subsampled dataset.
This PR adds the following new APIs:
- random::excess_sampling (todo may just call as sample_without_replacement)
- matrix::sample_rows
- matrix::gather for host input matrix
Authors:
- Tamas Bela Feher (https://github.com/tfeher)
Approvers:
- Artem M. Chirkin (https://github.com/achirkin)
- Ben Frederickson (https://github.com/benfred)
URL: #2155
Describe the bug
The IVF methods use random subsampling to create a training set for k-means clustering.
The random sampling algorithm allocates several temporary buffers:
raft/cpp/include/raft/random/detail/rng_impl.cuh
Lines 293 to 296 in d4ae271
On top of this
sortPairs
would also allocate temporary buffers, with sizeO(len)
Here
len
is the size (number of vectors) in the whole dataset. When we are indexing DEEP-1B, thenlen=1e9
, and the temporary space becomes prohibitive.Steps/Code to reproduce bug
Run DEEP-1B test with IVF-PQ, with subsample ratio 10. It will run OOM.
Expected behavior
Subsample vectors with minimal memory overhead.
The text was updated successfully, but these errors were encountered: