[New Model]: QwQ-32B #257

Yikun · 2025-03-07T04:09:22Z

The model to consider.

https://huggingface.co/Qwen/QwQ-32B

The closest model vllm already supports.

Consider same arch with Qwen2: https://huggingface.co/Qwen/QwQ-32B/blob/main/config.json#L3

It should be works well.

What's your difficulty of supporting the model you want?

No response

Yikun · 2025-03-07T04:12:54Z

Update 2025.03.07: according the feedback from community users, QwQ-32B works well with vLLM Ascend v0.7.3-dev branch!
===> More than 70G memory consumed
Please refer to https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/tutorials/multi_npu.html

yuzhup · 2025-03-08T07:07:15Z

基于vllm(v0.7.3) + vllm-ascend（v0.7.3-dev）配套验证：
1、拉取vllm：git clone -b v0.7.3 https://github.com/vllm-project/vllm.git
2、拉取安装vllm-Ascend：git clone -b v0.7.3-dev https://github.com/vllm-project/vllm-ascend.git
3、安装融合算子PTA包
4、启动服务化
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
python -m vllm.entrypoints.openai.api_server --model /home/data/QwQ-32B --max-model-len 4096 --port 8913 --trust-remote-code -tp 4
5、精度验证指令
export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1"
export SMPL_PARAM="{"temperature":0.0,"top_k":-1,"top_p":1,"repetition_penalty":1}"
benchmark
--DatasetPath /home/b/ceval
--DatasetType ceval
--ModelName qwq
--ModelPath /home/data/QwQ-32B
--Tokenizer True
--TestType vllm_client
--Concurrency 8
--Http http://127.0.0.1:8913
--TestAccuracy True
--MaxOutputLen 128
--SamplingParams $SMPL_PARAM
6、性能验证指令
#修改benchmark合成数据设置
vim pip show mindiebenchmark | grep Location | awk '{print $2}'/mindiebenchmark/config/synthetic_config.json
#设置输入输出的token数及数据集样本数量为待测case
{
"Input":{
"Method": "uniform",
"Params": {"MinValue": 256, "MaxValue": 256}
},
"Output": {
"Method": "uniform",
"Params": {"MinValue": 256, "MaxValue": 256}
},
"RequestCount": 400
}
#启动
benchmark
--DatasetType synthetic
--ModelName qwq
--ModelPath /home/data/QwQ-32B
--Tokenizer True
--TestType vllm_client
--Concurrency 16
--Http http://127.0.0.1:8913
--TestAccuracy False

Yikun added the new model label Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model]: QwQ-32B #257

[New Model]: QwQ-32B #257

Yikun commented Mar 7, 2025 •

edited

Loading

Yikun commented Mar 7, 2025 •

edited

Loading

yuzhup commented Mar 8, 2025 •

edited by Yikun

Loading

[New Model]: QwQ-32B #257

[New Model]: QwQ-32B #257

Comments

Yikun commented Mar 7, 2025 • edited Loading

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Yikun commented Mar 7, 2025 • edited Loading

yuzhup commented Mar 8, 2025 • edited by Yikun Loading

Yikun commented Mar 7, 2025 •

edited

Loading

Yikun commented Mar 7, 2025 •

edited

Loading

yuzhup commented Mar 8, 2025 •

edited by Yikun

Loading