-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Support three new evict strategy(lfu
, epoch_lfu
, epoch_lru
)
#152
Conversation
Documentation previewhttps://nvidia-merlin.github.io/HierarchicalKV/review/pr-152 |
178ec1d
to
931e10f
Compare
/blossom-ci |
include/merlin_hashtable.cuh
Outdated
@@ -431,6 +453,8 @@ class HashTable { | |||
score_type* evicted_scores, // (n) | |||
size_type* d_evicted_counter, // (1) | |||
cudaStream_t stream = 0, | |||
const score_type global_epoch = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Lifann , please help review the change on API, this PR implemented two new strategy which needs global_epoch
as the high 32bit of the score, for more detail please refer to https://github.com/rhdong/HierarchicalKV/tree/rhdong/epoch-lru#evict-strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fixed data format param for limited case will make the API hard to expand in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, but it seems no better choice. Do you have a better recommendation?
Basically, I hope the caller has full right to control the epoch, and I believe the built-in epoch counter could've clearer API definition. It's a little hard...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a possible choice:
- add a
std::atomic<S> global_epoch;
member for the Hashtable. - and add two member functions to operating it. And call them before calling the main API:
void global_epoch(const S epoch);
S global_epoch() const;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*
* A prototype to show how to get param inside functor implicitly.
*/
#include "cuda_runtime.h"
#include <stdio.h>
#include <vector>
#include <unistd.h>
using namespace std;
namespace functor {
__device__ int param_;
void set_param(int score) {
int ret = 0;
ret = cudaMemcpyToSymbol(param_, &score, sizeof(int), 0, cudaMemcpyHostToDevice);
printf("symbol op1 ret=%d\n", ret);
}
struct Functor {
public:
__device__ inline int internal_get_param() {
return ::functor::param_;
}
__device__ void operator()(int& key, int& score) {
int inc = internal_get_param();
score = inc;
}
};
}
template <typename ScorePred>
__global__ void gpu_any_kernel(int* keys, int* scores, size_t n) {
int tid = blockDim.x * blockIdx.x + threadIdx.x;
ScorePred pred;
if (tid < n) {
pred(keys[tid], scores[tid]);
}
}
void get_data(vector<int>& data, int** d_data) {
int ret = 0;
cudaMalloc(d_data, data.size() * sizeof(int));
cudaMemcpy(*d_data, data.data(), data.size() * sizeof(int), cudaMemcpyHostToDevice);
}
int main() {
int ret = 0;
cudaSetDevice(0);
cudaStream_t stream;
cudaStreamCreate(&stream);
vector<int> keys = {1,2,3,4,5};
vector<int> scores = {0,0,0,0,0};
int* d_keys = nullptr;
int* d_scores = nullptr;
get_data(keys, &d_keys);
get_data(scores, &d_scores);
functor::set_param(5);
cudaDeviceSynchronize();
// check whether if set_param work.
gpu_any_kernel<functor::Functor><<<5, 1, 0, stream>>>(d_keys, d_scores, keys.size());
ret = cudaMemcpyAsync(scores.data(), d_scores, sizeof(int) * scores.size(), cudaMemcpyDeviceToHost, stream);
printf("copy d2h ret=%d\n", ret);
cudaStreamSynchronize(stream);
for (int i = 0; i < 5; i++) {
printf("i=%d, score=%d\n", i, scores[i]); // It will get 5 on all pos.
}
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* * A prototype to show how to get param inside functor implicitly. */ #include "cuda_runtime.h" #include <stdio.h> #include <vector> #include <unistd.h> using namespace std; namespace functor { __device__ int param_; void set_param(int score) { int ret = 0; ret = cudaMemcpyToSymbol(param_, &score, sizeof(int), 0, cudaMemcpyHostToDevice); printf("symbol op1 ret=%d\n", ret); } struct Functor { public: __device__ inline int internal_get_param() { return ::functor::param_; } __device__ void operator()(int& key, int& score) { int inc = internal_get_param(); score = inc; } }; } template <typename ScorePred> __global__ void gpu_any_kernel(int* keys, int* scores, size_t n) { int tid = blockDim.x * blockIdx.x + threadIdx.x; ScorePred pred; if (tid < n) { pred(keys[tid], scores[tid]); } } void get_data(vector<int>& data, int** d_data) { int ret = 0; cudaMalloc(d_data, data.size() * sizeof(int)); cudaMemcpy(*d_data, data.data(), data.size() * sizeof(int), cudaMemcpyHostToDevice); } int main() { int ret = 0; cudaSetDevice(0); cudaStream_t stream; cudaStreamCreate(&stream); vector<int> keys = {1,2,3,4,5}; vector<int> scores = {0,0,0,0,0}; int* d_keys = nullptr; int* d_scores = nullptr; get_data(keys, &d_keys); get_data(scores, &d_scores); functor::set_param(5); cudaDeviceSynchronize(); // check whether if set_param work. gpu_any_kernel<functor::Functor><<<5, 1, 0, stream>>>(d_keys, d_scores, keys.size()); ret = cudaMemcpyAsync(scores.data(), d_scores, sizeof(int) * scores.size(), cudaMemcpyDeviceToHost, stream); printf("copy d2h ret=%d\n", ret); cudaStreamSynchronize(stream); for (int i = 0; i < 5; i++) { printf("i=%d, score=%d\n", i, scores[i]); // It will get 5 on all pos. } return 0; }
In the case, it's possible to apply any param implicitly on functor, which can traverse on the key-value-score pairs of inputs in device or global function. And it makes abstract on different scoring strategies possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged our thoughts here, please help review again, thanks: afd5813
/blossom-ci |
/blossom-ci |
1 similar comment
/blossom-ci |
/blossom-ci |
/blossom-ci |
6588af9
to
0816cf3
Compare
0816cf3
to
9799a9f
Compare
/blossom-ci |
/blossom-ci |
enum class EvictStrategy { | ||
kLru = 0, ///< LRU mode. | ||
kCustomized = 1 ///< Customized mode. | ||
struct EvictStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since EvictStategy is a template param for HashTable, is it very need to use it as a constexpr static int, instead of EvictStategy::kEpochLru? If it is, maybe use a macro to make the template param meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the structure mainly serves for conveniently exposing the API
to and being easily referred by the end-users via EvictStategy
, and I believe the macro has no name scope and can dirty the user namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have re-designed it, please refer to the latest code.
include/merlin_hashtable.cuh
Outdated
@@ -294,6 +308,9 @@ class HashTable { | |||
* | |||
* @param stream The CUDA stream that is used to execute the operation. | |||
* | |||
* @param global_epoch The global epoch for EpochLRU, EpochLFU, when it's set | |||
* to `DEFAULT_GLOBAL_EPOCH`, insert score in @p scores directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean "insert score in @scores"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
1c3d063
to
a0b0db8
Compare
/blossom-ci |
/blossom-ci |
afd5813
to
d384e0b
Compare
/blossom-ci |
d384e0b
to
f956e9c
Compare
/blossom-ci |
f956e9c
to
e561e33
Compare
/blossom-ci |
Benchmark Comparasion: On pure HBM mode, thread_local param with EpochLru:
On pure HBM mode, No thread_local param with EpochLru:
The difference between the two settings should be less than 0.5%. |
e561e33
to
59a0fd9
Compare
/blossom-ci |
59a0fd9
to
faf9ac4
Compare
/blossom-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
key_pos, key, score); | ||
ScoreFunctor::update_with_digest( | ||
bucket_keys_ptr, key_pos, scores, kv_idx, score, bucket_capacity, | ||
get_digest<K>(key), (occupy_result != OccupyResult::DUPLICATE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, occupy_result is not updated here, its value is INITIAL.
Anyway, here, occupy_result != OccupyResult:DUPLICATE always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, occupy_result is not updated here, its value is INITIAL. Anyway, here, occupy_result != OccupyResult:DUPLICATE always true.
@jiashuy, thanks for pointing out. I believe if running here always means the key is missed, this flag can be set to True
directly. Anyway, the current logic looks right.
No description provided.