You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A clear and concise description of what the bug is.
在两张80gA100上启用vllm部署qwen1.5-chat的72b模型失败,但是启用vllm部署qwen-chat72b模型是可以的
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", li
ne 219, in forward
hidden_states = self.mlp(hidden_states)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518,
in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", li
ne 78, in forward
x = self.act_fn(gate_up)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518,
in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527,
in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/home/jingtianran/anaconda3/envs/xinference_dev/lib/python3.11/site-packages/vllm/model_executor/layers/activation.py
", line 35, in forward
out = torch.empty(output_shape, dtype=x.dtype, device=x.device)
^^^^^^^^^^^^^^^^^
RuntimeError: [address=172.22.227.26:45081, pid=2664574] CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
To Reproduce
To help us to reproduce this bug, please provide information below:
python=3.11
xinference=0.9.2
vllm=0.3.0
torch=2.1.2
cuda=12.1
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
A clear and concise description of what the bug is.
在两张80gA100上启用vllm部署qwen1.5-chat的72b模型失败,但是启用vllm部署qwen-chat72b模型是可以的
To Reproduce
To help us to reproduce this bug, please provide information below:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: