Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize qwen2 model on Gaudi #233

Merged
merged 3 commits into from
Sep 20, 2024
Merged

Conversation

czhu15
Copy link

@czhu15 czhu15 commented Sep 3, 2024

Add extra mark_step() on each decode layer to optimize the performance on Gaudi.

When by default, the performance with below command:
VLLM_SKIP_WARMUP=true python benchmark_throughput.py --model /local_dataset_2/pytorch/Qwen2-7B-Instruct/ --device hpu --seed 2024 --backend vllm --input-len 1000 --output-len 500 --num-prompts 200 --dtype bfloat16
is:
Throughput: 1.34 requests/s, 2015.75 tokens/s

After applying this patch, the performance boosts to:
Throughput: 2.67 requests/s, 4003.20 tokens/s

extra mark_step() was added on each decode layer

Signed-off-by: Bob Zhu <[email protected]>
Copy link

@szutenberg szutenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm/model_executor/models/qwen2.py Outdated Show resolved Hide resolved
@czhu15
Copy link
Author

czhu15 commented Sep 20, 2024

@szutenberg , i just fixed a importing order issue on the PR which is identified by CI system.
Now it show "5 workflows awaiting approval", not sure if need your approve for another CI check again.
Thanks,
Bob

@szutenberg szutenberg merged commit 12d7033 into HabanaAI:habana_main Sep 20, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants