Added FLAGS to replace four params and the value can be adjusted for better speedup #9624

zhink · 2024-12-12T03:21:31Z

PR types

Others

PR changes

Others

Description

Added FLAGS to replace max_partition_size、encoder_block_shape_q、decoder_block_shape_q for append_attention OP. And the value can be adjusted for better speedup.

paddle-bot · 2024-12-12T03:21:35Z

Thanks for your contribution!

codecov · 2024-12-12T03:55:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.21%. Comparing base (3e15f0e) to head (87d0b78).
Report is 3 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9624      +/-   ##
===========================================
- Coverage    52.45%   52.21%   -0.25%     
===========================================
  Files          721      723       +2     
  Lines       115768   114326    -1442     
===========================================
- Hits         60728    59695    -1033     
+ Misses       55040    54631     -409

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2025-01-03T05:52:29Z

csrc/gpu/helper.h

+inline uint32_t get_decoder_block_shape_q() {
+    static const char* decoder_block_shape_q_env = std::getenv("FLAGS_flag_dec_block_shape_q");
+    static const uint32_t decoder_block_shape_q =
+            decoder_block_shape_q_env == nullptr ? 16 : std::stoi(std::string(decoder_block_shape_q_env));
+    return decoder_block_shape_q;
+}
+
+inline uint32_t get_encoder_block_shape_q() {
+    static const char* encoder_block_shape_q_env = std::getenv("FLAGS_flag_block_shape_q");
+    static const uint32_t encoder_block_shape_q =
+            encoder_block_shape_q_env == nullptr ? 64 : std::stoi(std::string(encoder_block_shape_q_env));
+    return encoder_block_shape_q;
+}


flag命名名以及代码位置可以再完善一下，1. 例如FLAGS_flag_，这个小写的flag是不是多余的？建议与变量名保持一致，2. max_partition_size这个flag和这个存放在一起呗

carryyu · 2025-01-03T05:53:02Z

谢谢，已收到！

DrownFish19 · 2025-01-03T10:52:01Z

llm/docs/predict/best_practices.md

+
+**Append Attention 优化**
+
+- `FLAGS_cascade_attention_max_partition_size`：Append Attention decoder计算时对cache_kv进行分chunk的chunk大小，batchsize为1时默认值为128,batchsize大于时512。显示设置时不区分batchsize。


默认值根据batchsize设置，batchsize=1时设置为128，batchsize>1时设置为512。显示设置时不再区分batchsize。

DrownFish19 · 2025-01-03T10:52:51Z

llm/docs/predict/best_practices.md

+**Append Attention 优化**
+
+- `FLAGS_cascade_attention_max_partition_size`：Append Attention decoder计算时对cache_kv进行分chunk的chunk大小，batchsize为1时默认值为128,batchsize大于时512。显示设置时不区分batchsize。
+- `FLAGS_dec_block_shape_q`：Append Attention decoder计算时对q进行分块的分块大小，默认值为16。设置为16即可。


“设置为16即可。”这里如果没有关联指导，就删除吧。下同。

DrownFish19

LGTM

zhink force-pushed the develop branch from b63ccdb to 8e0ba30 Compare December 12, 2024 03:23

zhink force-pushed the develop branch 2 times, most recently from ca4e474 to 476a243 Compare December 12, 2024 12:11

zhink changed the title ~~Added FLAGS to replace max_partition_size and the value can be adjusted for better speedup~~ Added FLAGS to replace four params and the value can be adjusted for better speedup Dec 12, 2024

zhink force-pushed the develop branch 2 times, most recently from b87d151 to ffb986b Compare December 13, 2024 02:17

zhink force-pushed the develop branch 3 times, most recently from d37765f to e8a33ff Compare December 26, 2024 07:38

add FLAGS instead max_partition_size

b6562e2

zhink force-pushed the develop branch from e8a33ff to b6562e2 Compare December 26, 2024 07:51

encoder_max_partition_size eq max_seq_len when encoder

d402f23

carryyu approved these changes Dec 30, 2024

View reviewed changes

yuanlehome reviewed Jan 3, 2025

View reviewed changes

zhink force-pushed the develop branch from 64331c7 to c9cdc18 Compare January 3, 2025 06:57

Add environment variable description

7a1e00a

zhink force-pushed the develop branch from c9cdc18 to 7a1e00a Compare January 3, 2025 07:02

zhink marked this pull request as draft January 3, 2025 07:38

zhink marked this pull request as ready for review January 3, 2025 07:38

Merge branch 'PaddlePaddle:develop' into develop

d858436

DrownFish19 reviewed Jan 3, 2025

View reviewed changes

style

87d0b78

DrownFish19 approved these changes Jan 3, 2025

View reviewed changes

ZHUI approved these changes Jan 6, 2025

View reviewed changes

ZHUI merged commit a52035f into PaddlePaddle:develop Jan 6, 2025
10 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added FLAGS to replace four params and the value can be adjusted for better speedup #9624

Added FLAGS to replace four params and the value can be adjusted for better speedup #9624

zhink commented Dec 12, 2024 •

edited

Loading

paddle-bot bot commented Dec 12, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

yuanlehome Jan 3, 2025

zhink Jan 3, 2025

carryyu commented Jan 3, 2025 via email

DrownFish19 Jan 3, 2025

DrownFish19 Jan 3, 2025

DrownFish19 left a comment


		Append Attention 优化

		- `FLAGS_cascade_attention_max_partition_size`：Append Attention decoder计算时对cache_kv进行分chunk的chunk大小，batchsize为1时默认值为128,batchsize大于时512。显示设置时不区分batchsize。

Added FLAGS to replace four params and the value can be adjusted for better speedup #9624

Added FLAGS to replace four params and the value can be adjusted for better speedup #9624

Conversation

zhink commented Dec 12, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 12, 2024

codecov bot commented Dec 12, 2024 • edited Loading

Codecov Report

yuanlehome Jan 3, 2025

Choose a reason for hiding this comment

zhink Jan 3, 2025

Choose a reason for hiding this comment

carryyu commented Jan 3, 2025 via email

DrownFish19 Jan 3, 2025

Choose a reason for hiding this comment

DrownFish19 Jan 3, 2025

Choose a reason for hiding this comment

DrownFish19 left a comment

Choose a reason for hiding this comment

zhink commented Dec 12, 2024 •

edited

Loading

codecov bot commented Dec 12, 2024 •

edited

Loading