Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: QwQ-32B #257

Open
Yikun opened this issue Mar 7, 2025 · 2 comments
Open

[New Model]: QwQ-32B #257

Yikun opened this issue Mar 7, 2025 · 2 comments

Comments

@Yikun
Copy link
Collaborator

Yikun commented Mar 7, 2025

The model to consider.

https://huggingface.co/Qwen/QwQ-32B

The closest model vllm already supports.

Consider same arch with Qwen2: https://huggingface.co/Qwen/QwQ-32B/blob/main/config.json#L3

It should be works well.

What's your difficulty of supporting the model you want?

No response

@Yikun Yikun added the new model label Mar 7, 2025
@Yikun
Copy link
Collaborator Author

Yikun commented Mar 7, 2025

Update 2025.03.07: according the feedback from community users, QwQ-32B works well with vLLM Ascend v0.7.3-dev branch!
===> More than 70G memory consumed
Please refer to https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_npu.html

@yuzhup
Copy link

yuzhup commented Mar 8, 2025

基于vllm(v0.7.3) + vllm-ascend(v0.7.3-dev)配套验证:
1、拉取vllm:git clone -b v0.7.3 https://github.com/vllm-project/vllm.git
2、拉取安装vllm-Ascend:git clone -b v0.7.3-dev https://github.com/vllm-project/vllm-ascend.git
3、安装融合算子PTA包
4、启动服务化
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
python -m vllm.entrypoints.openai.api_server --model /home/data/QwQ-32B --max-model-len 4096 --port 8913 --trust-remote-code -tp 4
5、精度验证指令
export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1"
export SMPL_PARAM="{"temperature":0.0,"top_k":-1,"top_p":1,"repetition_penalty":1}"
benchmark
--DatasetPath /home/b/ceval
--DatasetType ceval
--ModelName qwq
--ModelPath /home/data/QwQ-32B
--Tokenizer True
--TestType vllm_client
--Concurrency 8
--Http http://127.0.0.1:8913
--TestAccuracy True
--MaxOutputLen 128
--SamplingParams $SMPL_PARAM
6、性能验证指令
#修改benchmark合成数据设置
vim pip show mindiebenchmark | grep Location | awk '{print $2}'/mindiebenchmark/config/synthetic_config.json
#设置输入输出的token数及数据集样本数量为待测case
{
"Input":{
"Method": "uniform",
"Params": {"MinValue": 256, "MaxValue": 256}
},
"Output": {
"Method": "uniform",
"Params": {"MinValue": 256, "MaxValue": 256}
},
"RequestCount": 400
}
#启动
benchmark
--DatasetType synthetic
--ModelName qwq
--ModelPath /home/data/QwQ-32B
--Tokenizer True
--TestType vllm_client
--Concurrency 16
--Http http://127.0.0.1:8913
--TestAccuracy False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants