[Paddle Inference] Support GQA Decoder #58472

zhoutianzi666 · 2023-10-30T02:06:19Z

PR types

Others

PR changes

Others

Description

Pcard-71500

解码阶段支持GQA，不影响API层面

MARD1NO · 2023-10-30T06:15:42Z

paddle/phi/infermeta/multiary.cc

@@ -4279,8 +4279,12 @@ void MaskedMultiheadAttentionInferMeta(const MetaTensor& x,
                                       MetaTensor* beam_cache_offset_out) {
  int bsz = static_cast<int>(x.dims()[0]);
  auto cache_kv_dims = cache_kv.dims();
-  int num_head = static_cast<int>(cache_kv.dims()[2]);
+  int k_num_head = static_cast<int>(cache_kv.dims()[2]);
+  int v_num_head = k_num_head;


感觉没必要拉出来一个v_num_head. 直接一个kv_num_head吧

还需要check num_head % kv_num_head == 0

感觉没必要拉出来一个v_num_head. 直接一个kv_num_head吧

还需要check num_head % kv_num_head == 0

还需要check num_head % kv_num_head == 0 ok，感谢review。

MARD1NO · 2023-10-30T06:21:57Z

paddle/phi/kernels/fusion/gpu/masked_multihead_attention.cu

@@ -92,6 +92,9 @@ struct Masked_multihead_attention_params {
  int beam_width;


把newKV写到KVCache的时候应该有个判断对应blockIdx是group idx

把newKV写到KVCache的时候应该有个判断对应blockIdx是group idx
感谢reveiw。
这里 hi = blockIdx.x表示的还是query的head 的索引，至于key的head索引，用 hi / num_head_per_group 来获得的。

MARD1NO

LGTM

paddle-bot · 2023-11-02T06:25:43Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2023-11-02T06:25:44Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

Support GQA Decoder in masked_multihead_attention.cu

commit

0ccdf83

zhoutianzi666 changed the title ~~[Paddle Inference] Support GQA~~ [Paddle Inference] Support GQA Decoder Oct 30, 2023

zhoutianzi666 added 5 commits October 30, 2023 02:36

update paddle/phi/infermeta/multiary.cc

246c196

commit

dfcc221

update paddle/phi/kernels/fusion/gpu/masked_multihead_attention.cu

4d83ade

update masked_multihead_attention.cu

a308e05

not modify python/paddle/base/dygraph/tensor_patch_methods.py

426eb49

MARD1NO reviewed Oct 30, 2023

View reviewed changes

vivienfanghuagood assigned vivienfanghuagood and xiaoxiaohehe001 Oct 30, 2023

add check

aa63eba

vivienfanghuagood approved these changes Oct 30, 2023

View reviewed changes

MARD1NO approved these changes Oct 30, 2023

View reviewed changes

yuanlehome approved these changes Oct 30, 2023

View reviewed changes

paddle-bot bot added the contributor External developers label Oct 30, 2023

zhoutianzi666 merged commit 0651dde into PaddlePaddle:develop Oct 31, 2023

paddle-bot bot removed the contributor External developers label Nov 3, 2023

zeroRains pushed a commit to zeroRains/Paddle that referenced this pull request Nov 8, 2023

[Paddle Inference] Support GQA Decoder (PaddlePaddle#58472)

acdea06

Support GQA Decoder in masked_multihead_attention.cu

zhoutianzi666 mentioned this pull request Nov 9, 2023

[llm] support GQA in chatglmv2 PaddlePaddle/PaddleNLP#7412

Merged

danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023

[Paddle Inference] Support GQA Decoder (PaddlePaddle#58472)

a37b2b0

Support GQA Decoder in masked_multihead_attention.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle Inference] Support GQA Decoder #58472

[Paddle Inference] Support GQA Decoder #58472

zhoutianzi666 commented Oct 30, 2023 •

edited

Loading

MARD1NO Oct 30, 2023

zhoutianzi666 Oct 30, 2023

MARD1NO Oct 30, 2023

zhoutianzi666 Oct 30, 2023 •

edited

Loading

MARD1NO left a comment

paddle-bot bot commented Nov 2, 2023

paddle-bot bot commented Nov 2, 2023 •

edited

Loading

		@@ -92,6 +92,9 @@ struct Masked_multihead_attention_params {
		int beam_width;

[Paddle Inference] Support GQA Decoder #58472

[Paddle Inference] Support GQA Decoder #58472

Conversation

zhoutianzi666 commented Oct 30, 2023 • edited Loading

PR types

PR changes

Description

MARD1NO Oct 30, 2023

Choose a reason for hiding this comment

zhoutianzi666 Oct 30, 2023

Choose a reason for hiding this comment

MARD1NO Oct 30, 2023

Choose a reason for hiding this comment

zhoutianzi666 Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

MARD1NO left a comment

Choose a reason for hiding this comment

paddle-bot bot commented Nov 2, 2023

paddle-bot bot commented Nov 2, 2023 • edited Loading

zhoutianzi666 commented Oct 30, 2023 •

edited

Loading

zhoutianzi666 Oct 30, 2023 •

edited

Loading

paddle-bot bot commented Nov 2, 2023 •

edited

Loading