optimize qwen2 model on Gaudi #233

czhu15 · 2024-09-03T13:50:30Z

Add extra mark_step() on each decode layer to optimize the performance on Gaudi.

When by default, the performance with below command:
VLLM_SKIP_WARMUP=true python benchmark_throughput.py --model /local_dataset_2/pytorch/Qwen2-7B-Instruct/ --device hpu --seed 2024 --backend vllm --input-len 1000 --output-len 500 --num-prompts 200 --dtype bfloat16
is:
Throughput: 1.34 requests/s, 2015.75 tokens/s

After applying this patch, the performance boosts to:
Throughput: 2.67 requests/s, 4003.20 tokens/s

extra mark_step() was added on each decode layer Signed-off-by: Bob Zhu <[email protected]>

szutenberg

The change is similar to what we have in https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/model_executor/models/llama.py#L323 -> LGTM

vllm/model_executor/models/qwen2.py

Signed-off-by: Bob Zhu <[email protected]>

czhu15 · 2024-09-20T03:10:24Z

@szutenberg , i just fixed a importing order issue on the PR which is identified by CI system.
Now it show "5 workflows awaiting approval", not sure if need your approve for another CI check again.
Thanks,
Bob

optimize qwen2 model on Gaudi

6525056

extra mark_step() was added on each decode layer Signed-off-by: Bob Zhu <[email protected]>

szutenberg requested changes Sep 12, 2024

View reviewed changes

vllm/model_executor/models/qwen2.py Outdated Show resolved Hide resolved

remove is_hip from qwen2

88610ec

Signed-off-by: Bob Zhu <[email protected]>

szutenberg approved these changes Sep 13, 2024

View reviewed changes

fix the import sequence issue

63acbed

szutenberg merged commit 12d7033 into HabanaAI:habana_main Sep 20, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize qwen2 model on Gaudi #233

optimize qwen2 model on Gaudi #233

czhu15 commented Sep 3, 2024

szutenberg left a comment

czhu15 commented Sep 20, 2024

optimize qwen2 model on Gaudi #233

optimize qwen2 model on Gaudi #233

Conversation

czhu15 commented Sep 3, 2024

szutenberg left a comment

Choose a reason for hiding this comment

czhu15 commented Sep 20, 2024