BUG：使用vllm部署qwen1.5-chat 72b模型出错 #1122

YYLCyylc · 2024-03-11T06:44:32Z

Describe the bug

A clear and concise description of what the bug is.
在两张80gA100上启用vllm部署qwen1.5-chat的72b模型失败，但是启用vllm部署qwen-chat72b模型是可以的

  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
    return forward_call(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", li
ne 219, in forward
    hidden_states = self.mlp(hidden_states)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518,
in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
    return forward_call(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", li
ne 78, in forward
    x = self.act_fn(gate_up)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518,
in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
    return forward_call(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/layers/activation.py
", line 35, in forward
    out = torch.empty(output_shape, dtype=x.dtype, device=x.device)
    ^^^^^^^^^^^^^^^^^
RuntimeError: [address=172.22.227.26:45081, pid=2664574] CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

To Reproduce

To help us to reproduce this bug, please provide information below:

python=3.11
xinference=0.9.2
vllm=0.3.0
torch=2.1.2
cuda=12.1

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

ChengjieLi28 · 2024-03-11T06:58:12Z

vllm-project/vllm#2773
能否试下直接用vllm能否使用

YYLCyylc · 2024-03-11T09:07:05Z

应该是vllm的bug

ye7love7 · 2024-03-12T07:26:41Z

我试过用vllm0.3.0到0.3.3，都无法vllm启动Qwen1.5-72B-Chat

yinghaodang · 2024-03-14T01:45:07Z

我部署过Qwen1.5-72B-Chat，两张A100应该只能部署int4量化的。我没有遇到问题...我用的是官方提供的镜像。然后将英伟达容器驱动，英伟达驱动升级到较新的版本即可。（也许是 12.3 ?）

YYLCyylc · 2024-03-18T02:39:26Z

vllm更新到0.3.3 xinference更新到0.9.3后可以正常部署

XprobeBot added the gpu label Mar 11, 2024

XprobeBot added this to the v0.9.3 milestone Mar 11, 2024

XprobeBot modified the milestones: v0.9.3, v0.9.4 Mar 15, 2024

YYLCyylc closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG：使用vllm部署qwen1.5-chat 72b模型出错 #1122

BUG：使用vllm部署qwen1.5-chat 72b模型出错 #1122

YYLCyylc commented Mar 11, 2024

ChengjieLi28 commented Mar 11, 2024

YYLCyylc commented Mar 11, 2024

ye7love7 commented Mar 12, 2024

yinghaodang commented Mar 14, 2024

YYLCyylc commented Mar 18, 2024

BUG：使用vllm部署qwen1.5-chat 72b模型出错 #1122

BUG：使用vllm部署qwen1.5-chat 72b模型出错 #1122

Comments

YYLCyylc commented Mar 11, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

ChengjieLi28 commented Mar 11, 2024

YYLCyylc commented Mar 11, 2024

ye7love7 commented Mar 12, 2024

yinghaodang commented Mar 14, 2024

YYLCyylc commented Mar 18, 2024