Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

昇腾lmdeploy使用 lmdeploy APIClient 接口时,推理结果被截断 #2969

Open
3 tasks done
winni0 opened this issue Dec 28, 2024 · 5 comments
Open
3 tasks done
Assignees

Comments

@winni0
Copy link

winni0 commented Dec 28, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

昇腾lmdeploy使用lmdeploy serve api_server
/LLaMA-Factory-main/model/Qwen2.5-7B-Instruct
--backend pytorch
--server-port 8000
--device ascend
--session-len 8192启动服务,使用 lmdeploy APIClient 接口接收结果时,推理结果被截断。
d692e0b3895d7a5a3d8a71b9ac274c7

Reproduction

API服务启动命令如下:lmdeploy serve api_server
/LLaMA-Factory-main/model/Qwen2.5-7B-Instruct
--backend pytorch
--server-port 8000
--device ascend
--session-len 8192,lmdeploy APIClient 接口代码如下:
11e4a13323a13314f8aab56fbca8a47

Environment

TorchVision: 0.18.1
LMDeploy: 0.6.4+191a7dd
transformers: 4.47.1
gradio: Not Found
fastapi: 0.115.6
pydantic: 2.10.4
triton: Not Found

Error traceback

没有报错
@jinminxi104 jinminxi104 self-assigned this Dec 29, 2024
@jinminxi104
Copy link
Collaborator

please add max_tokens into completions_v1. (please also note that you are using graph mode, which cause a compile phase at first run.)
image

@winni0
Copy link
Author

winni0 commented Jan 3, 2025

你这个回答结果也是被截断的呀

@jinminxi104
Copy link
Collaborator

你这个回答结果也是被截断的呀

我也设定了截断的值。你可以设大,比如设定2k

@MiningIrving
Copy link

MiningIrving commented Jan 5, 2025

我也设定了截断的值。你可以设大,比如设定2k

请问,现在支持qwen2.5:14B及以上参数的模型吗

@jinminxi104
Copy link
Collaborator

我也设定了截断的值。你可以设大,比如设定2k

请问,现在支持qwen2.5:14B及以上参数的模型吗

支持,请设定合适的tp值。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants