-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Paddle Inference] Support GQA Decoder #58472
[Paddle Inference] Support GQA Decoder #58472
Conversation
@@ -4279,8 +4279,12 @@ void MaskedMultiheadAttentionInferMeta(const MetaTensor& x, | |||
MetaTensor* beam_cache_offset_out) { | |||
int bsz = static_cast<int>(x.dims()[0]); | |||
auto cache_kv_dims = cache_kv.dims(); | |||
int num_head = static_cast<int>(cache_kv.dims()[2]); | |||
int k_num_head = static_cast<int>(cache_kv.dims()[2]); | |||
int v_num_head = k_num_head; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉没必要拉出来一个v_num_head. 直接一个kv_num_head吧
还需要check num_head % kv_num_head == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉没必要拉出来一个v_num_head. 直接一个kv_num_head吧
还需要check num_head % kv_num_head == 0
还需要check num_head % kv_num_head == 0
ok,感谢review。
@@ -92,6 +92,9 @@ struct Masked_multihead_attention_params { | |||
int beam_width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把newKV写到KVCache的时候应该有个判断对应blockIdx是group idx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把newKV写到KVCache的时候应该有个判断对应blockIdx是group idx
感谢reveiw。
这里 hi = blockIdx.x表示的还是query的head 的索引,至于key的head索引,用 hi / num_head_per_group 来获得的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
你的PR提交成功,感谢你对开源项目的贡献! |
✅ This PR's description meets the template requirements! |
Support GQA Decoder in masked_multihead_attention.cu
Support GQA Decoder in masked_multihead_attention.cu
PR types
Others
PR changes
Others
Description
Pcard-71500