how to reduce the max_token when using vllm with the parameter "limit_mm_per_prompt" > 1 when initiating LLM( ) #734

luosting · 2025-02-07T09:01:45Z

I see when I using default setting to start a vllm model, the console show that:

Computed max_num_seqs (min(256, 128000 // 131072)) to be less than 1. Setting it to the mi
nimum value of 1.

I think some setting has enlarged the occupancy of input videos or images compared to Qwen 2 VL, how can I reduce it so that accelerating the deployment of my vllm model.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to reduce the max_token when using vllm with the parameter "limit_mm_per_prompt" > 1 when initiating LLM( ) #734

how to reduce the max_token when using vllm with the parameter "limit_mm_per_prompt" > 1 when initiating LLM( ) #734

luosting commented Feb 7, 2025

how to reduce the max_token when using vllm with the parameter "limit_mm_per_prompt" > 1 when initiating LLM( ) #734

how to reduce the max_token when using vllm with the parameter "limit_mm_per_prompt" > 1 when initiating LLM( ) #734

Comments

luosting commented Feb 7, 2025