Skip to content

Commit

Permalink
Add environment variable description
Browse files Browse the repository at this point in the history
  • Loading branch information
zhink committed Jan 3, 2025
1 parent d402f23 commit 64331c7
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
14 changes: 14 additions & 0 deletions csrc/gpu/append_attn/append_attention_kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -299,4 +299,18 @@ inline uint32_t get_max_partition_size(int bsz) {
static const uint32_t max_partition_size =
max_partition_size_env == nullptr ? 0 : std::stoul(std::string(max_partition_size_env));
return (max_partition_size != 0 ? max_partition_size : (bsz == 1 ? 128 : 512));
}

inline uint32_t get_decoder_block_shape_q() {
static const char* decoder_block_shape_q_env = std::getenv("FLAGS_dec_block_shape_q");
static const uint32_t decoder_block_shape_q =
decoder_block_shape_q_env == nullptr ? 16 : std::stoi(std::string(decoder_block_shape_q_env));
return decoder_block_shape_q;
}

inline uint32_t get_encoder_block_shape_q() {
static const char* encoder_block_shape_q_env = std::getenv("FLAGS_enc_block_shape_q");
static const uint32_t encoder_block_shape_q =
encoder_block_shape_q_env == nullptr ? 64 : std::stoi(std::string(encoder_block_shape_q_env));
return encoder_block_shape_q;
}
6 changes: 6 additions & 0 deletions llm/docs/predict/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,9 @@ PaddleNLP 提供了多种环境变量,用于优化推理性能和资源使用
- `FLAGS_fraction_of_gpu_memory_to_use`:GPU 显存使用率,默认值为0.9。设置为0.9即可。

- `FLAGS_gemm_use_half_precision_compute_type`:是否使用半精度浮点数计算,默认值为0。设置为0即可。

**Append Attention 优化**

- `FLAGS_cascade_attention_max_partition_size`:Append Attention decoder计算时对cache_kv进行分chunk的chunk大小,batchsize为1时默认值为128,batchsize大于时512。显示设置时不区分batchsize。
- `FLAGS_dec_block_shape_q`:Append Attention decoder计算时对q进行分块的分块大小,默认值为16。设置为16即可。
- `FLAGS_enc_block_shape_q`:Append Attention encoder计算时对q进行分块的分块大小,默认值为64。设置为64即可。

0 comments on commit 64331c7

Please sign in to comment.