-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786
Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency #1786
Conversation
a653b8b
to
7aefb3b
Compare
@achirkin can you retarget this PR to 24.4? It's too late to get this into 24.02. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR Artem, it looks good to me. I have one question, but that need not block merging this.
inline auto operator()(const search_kernel_key& x) const noexcept -> std::size_t | ||
{ | ||
return (size_t{x.manage_local_topk} << 63) + | ||
size_t{x.topk} * size_t{x.n_probes} * size_t{x.n_queries} + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this hash with parameters ("*") multiplied used? I guess occasional collisions do not matter much for us, but a proper hash_combine functions is not much more complex than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! But does raft depend on boost already? There seem to be no standard library equivalent for this.
Also, in the current implementation of LRU cache, we don't use hash values at all (we changed the implementation from a hashmap to a vector of key-value tuples).
/merge |
This PR aims at reducing the latency in IVF-PQ and related functions, especially with small work sizes and in the "throughput" benchmark mode.
matrix::select_k
: make sure all temporary allocations use raft's workspace memory resource.