-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added FLAGS to replace four params and the value can be adjusted for better speedup #9624
Conversation
Thanks for your contribution! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9624 +/- ##
===========================================
- Coverage 52.45% 52.21% -0.25%
===========================================
Files 721 723 +2
Lines 115768 114326 -1442
===========================================
- Hits 60728 59695 -1033
+ Misses 55040 54631 -409 ☔ View full report in Codecov by Sentry. |
ca4e474
to
476a243
Compare
b87d151
to
ffb986b
Compare
d37765f
to
e8a33ff
Compare
csrc/gpu/helper.h
Outdated
inline uint32_t get_decoder_block_shape_q() { | ||
static const char* decoder_block_shape_q_env = std::getenv("FLAGS_flag_dec_block_shape_q"); | ||
static const uint32_t decoder_block_shape_q = | ||
decoder_block_shape_q_env == nullptr ? 16 : std::stoi(std::string(decoder_block_shape_q_env)); | ||
return decoder_block_shape_q; | ||
} | ||
|
||
inline uint32_t get_encoder_block_shape_q() { | ||
static const char* encoder_block_shape_q_env = std::getenv("FLAGS_flag_block_shape_q"); | ||
static const uint32_t encoder_block_shape_q = | ||
encoder_block_shape_q_env == nullptr ? 64 : std::stoi(std::string(encoder_block_shape_q_env)); | ||
return encoder_block_shape_q; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flag命名名以及代码位置可以再完善一下,1. 例如FLAGS_flag_,这个小写的flag是不是多余的?建议与变量名保持一致,2. max_partition_size这个flag和这个存放在一起呗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
谢谢,已收到!
|
llm/docs/predict/best_practices.md
Outdated
|
||
**Append Attention 优化** | ||
|
||
- `FLAGS_cascade_attention_max_partition_size`:Append Attention decoder计算时对cache_kv进行分chunk的chunk大小,batchsize为1时默认值为128,batchsize大于时512。显示设置时不区分batchsize。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认值根据batchsize设置,batchsize=1时设置为128,batchsize>1时设置为512。显示设置时不再区分batchsize。
llm/docs/predict/best_practices.md
Outdated
**Append Attention 优化** | ||
|
||
- `FLAGS_cascade_attention_max_partition_size`:Append Attention decoder计算时对cache_kv进行分chunk的chunk大小,batchsize为1时默认值为128,batchsize大于时512。显示设置时不区分batchsize。 | ||
- `FLAGS_dec_block_shape_q`:Append Attention decoder计算时对q进行分块的分块大小,默认值为16。设置为16即可。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“设置为16即可。”这里如果没有关联指导,就删除吧。下同。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Description
Added FLAGS to replace max_partition_size、encoder_block_shape_q、decoder_block_shape_q for append_attention OP. And the value can be adjusted for better speedup.