-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc]: 部署大模型返回结果乱码 #244
Comments
你是不是dtype被写成和模型不匹配得了?我的乱码是!!!!!!!!!!!!!!!!一千多个 |
我也出现过很多!!!!的情况,我没有改动dtype啊,用GPU的vllm都正常,你这个问题解决了吗?我刚刚用vllm-ascend官方的quay.io/ascend/vllm-ascend:v0.7.1rc1镜像试了一下,就正常了,但是我自己安装的镜像就不行,现在想确定一下是什么原因。后面我再对比一下我自己做的镜像和官方镜像的依赖差异。 |
官方镜像: 自制镜像: 以上是两个镜像依赖的区别,自制镜像是按照官方教程安装的,主要的区别在python版本、CANN版本上,是否是因为自制镜像的nnal和cann版本不匹配或者cann版本太低的原因? |
Thanks for feedback, we haven't test the 8.0.RC2, but I think it might cause accuracy issue. You could try to replace the base image if you think it's a old CANN issue to reproduce: Line 18 in 3217f0d
In general, pytorch npu, CANN, and NNAL may cause accuracy problems, we recomand you use the version which required in https://github.com/vllm-project/vllm-ascend?tab=readme-ov-file#prerequisites . |
可以试试在推理的时候设置top_p避免乱码 |
📚 The doc issue
INFO 03-05 17:42:27 executor_base.py:108] # CPU blocks: 18111, # CPU blocks: 4681
INFO 03-05 17:42:27 executor_base.py:113] Maximum concurrency for 2048 tokens per request: 141.49x
INFO 03-05 17:42:27 llm_engine.py:429] init engine (profile, create kv cache, warmup model) took 1.91 seconds
INFO 03-05 17:42:28 api_server.py:754] Using supplied chat template:
INFO 03-05 17:42:28 api_server.py:754] None
INFO 03-05 17:42:28 launcher.py:19] Available routes are:
INFO 03-05 17:42:28 launcher.py:27] Route: /openapi.json, Methods: GET, HEAD
INFO 03-05 17:42:28 launcher.py:27] Route: /docs, Methods: GET, HEAD
INFO 03-05 17:42:28 launcher.py:27] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 03-05 17:42:28 launcher.py:27] Route: /redoc, Methods: GET, HEAD
INFO 03-05 17:42:28 launcher.py:27] Route: /health, Methods: GET
INFO 03-05 17:42:28 launcher.py:27] Route: /ping, Methods: GET, POST
INFO 03-05 17:42:28 launcher.py:27] Route: /tokenize, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /detokenize, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /v1/models, Methods: GET
INFO 03-05 17:42:28 launcher.py:27] Route: /version, Methods: GET
INFO 03-05 17:42:28 launcher.py:27] Route: /v1/chat/completions, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /v1/completions, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /v1/embeddings, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /pooling, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /score, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /v1/score, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /rerank, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /v1/rerank, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /v2/rerank, Methods: POST
INFO 03-05 17:42:28 launcher.py:27] Route: /invocations, Methods: POST
INFO: Started server process [13646]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:11468 (Press CTRL+C to quit)
INFO 03-05 17:42:39 chat_utils.py:330] Detected the chat template content format to be 'string'. You can set
--chat-template-content-format
to override this.INFO 03-05 17:42:39 logger.py:37] Received request chatcmpl-0247a6a6fc2f4436bcb025805a0f2155: prompt: '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\n你是谁<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.1, top_p=1.0, top_k=1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=64, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=False, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 03-05 17:42:39 engine.py:273] Added request chatcmpl-0247a6a6fc2f4436bcb025805a0f2155.
INFO 03-05 17:42:43 metrics.py:453] Avg prompt throughput: 6.2 tokens/s, Avg generation throughput: 11.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO: 192.168.230.28:52949 - "POST /v1/chat/completions HTTP/1.1" 200 OK
torch-npu 2.5.1.dev20250218(0226也试过,不行)
vllm 0.7.1+empty
vllm_ascend 0.7.1rc1
transformers 4.48.2
请求结果:
问题是“你好”,返回如下:
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "0 gumPropagationslideDownfoundland悱rottle både.-轻松 for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for for",
"tool_calls": []
},
部署的大模型是qwen2.5-7B和deepseek-r1-distill-14b,都出现乱码的情况。
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: