Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title>使用qwenllm/qwenvl:latest镜像,tesla t4*4 部署Qwen2-VL-2B-Instruct相关的模型,报错。 #465

Open
2 tasks done
michaelwithu opened this issue Sep 12, 2024 · 0 comments

Comments

@michaelwithu
Copy link

michaelwithu commented Sep 12, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

ERROR 09-13 03:41:19 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 138 died, exit code: -15
INFO 09-13 03:41:19 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Process SpawnProcess-1
:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()


File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 328, in forward
x = x + self.mlp(self.norm2(x))
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

docker run --gpus all -it --shm-size=64g --privileged --name qwen2vllm-2 --network="host" -v $(pwd):/app qwenllm/qwenvl:latest
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-2B-Instruct --model /app/qwen/Qwen2-VL-2B-Instruct --max-model-len 16384 --dtype half --tensor-parallel-size 4

运行环境 | Environment

- OS:ubuntu20.04
- Docker image: qwenllm/qwenvl:latest
- Nvidia driver : 550.54.14

备注 | Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant