Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to reduce the max_token when using vllm with the parameter "limit_mm_per_prompt" > 1 when initiating LLM( ) #734

Open
luosting opened this issue Feb 7, 2025 · 0 comments

Comments

@luosting
Copy link

luosting commented Feb 7, 2025

I see when I using default setting to start a vllm model, the console show that:

Computed max_num_seqs (min(256, 128000 // 131072)) to be less than 1. Setting it to the mi
nimum value of 1.

I think some setting has enlarged the occupancy of input videos or images compared to Qwen 2 VL, how can I reduce it so that accelerating the deployment of my vllm model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant