-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Unable to serve Qwen2-audio in V1 #12168
Comments
|
Hmm, I'm able to run this model if I set |
Maybe you have to update your local HF repo as the HF processor for this model changed recently. |
Thanks @DarkLight1337 ! I was actually using tranformers=4.48.0 and latest vLLM local build when I encountered the above issue. I downgraded to tranformers=4.47.1 and the model was successfully loaded without any issue. I think this is caused by this HF change introduced in 4.48.0? |
This issue should be fixed in #12187, can you try it out? |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
Failed to serve Qwen2-audio with V1 engine (would like to enable prefix caching):
VLLM_TRACE_FUNCTION=1 NCCL_DEBUG=TRACE VLLM_LOGGING_LEVEL=DEBUG VLLM_USE_V1=1 VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve /xxx/omni/Qwen2-Audio/Qwen2-Audio-7B-Instruct --limit_mm_per_prompt 'audio=5'
Traceback:
commit id=87a0c076afafb93dd082ff3876bea08adca56c56
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: