Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:使用vllm部署qwen1.5-chat 72b模型出错 #1122

Closed
YYLCyylc opened this issue Mar 11, 2024 · 5 comments
Closed

BUG:使用vllm部署qwen1.5-chat 72b模型出错 #1122

YYLCyylc opened this issue Mar 11, 2024 · 5 comments
Labels
Milestone

Comments

@YYLCyylc
Copy link

Describe the bug

A clear and concise description of what the bug is.
在两张80gA100上启用vllm部署qwen1.5-chat的72b模型失败,但是启用vllm部署qwen-chat72b模型是可以的

  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
    return forward_call(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", li
ne 219, in forward
    hidden_states = self.mlp(hidden_states)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518,
in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
    return forward_call(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", li
ne 78, in forward
    x = self.act_fn(gate_up)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518,
in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
    return forward_call(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/layers/activation.py
", line 35, in forward
    out = torch.empty(output_shape, dtype=x.dtype, device=x.device)
    ^^^^^^^^^^^^^^^^^
RuntimeError: [address=172.22.227.26:45081, pid=2664574] CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. python=3.11
  2. xinference=0.9.2
  3. vllm=0.3.0
  4. torch=2.1.2
  5. cuda=12.1

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added the gpu label Mar 11, 2024
@XprobeBot XprobeBot added this to the v0.9.3 milestone Mar 11, 2024
@ChengjieLi28
Copy link
Contributor

vllm-project/vllm#2773
能否试下直接用vllm能否使用

@YYLCyylc
Copy link
Author

应该是vllm的bug

@ye7love7
Copy link

我试过用vllm0.3.0到0.3.3,都无法vllm启动Qwen1.5-72B-Chat

@yinghaodang
Copy link

我部署过Qwen1.5-72B-Chat,两张A100应该只能部署int4量化的。我没有遇到问题...我用的是官方提供的镜像。然后将英伟达容器驱动,英伟达驱动升级到较新的版本即可。(也许是 12.3 ?)

@XprobeBot XprobeBot modified the milestones: v0.9.3, v0.9.4 Mar 15, 2024
@YYLCyylc
Copy link
Author

vllm更新到0.3.3 xinference更新到0.9.3后可以正常部署

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants