You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run vllm based on the code example in the readme file on an 8-card A100, the following warning occurs: (VllmWorkerProcess pid=427033) WARNING 02-08 11:44:42 profiling.py:187] The context length (128000) of the model is too short to hold the multi-modal embeddings in the worst case (131072 tokens in total, out of which {'image': 16384, 'video': 114688} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase max_model_len, reduce max_num_seqs, and/or reduce mm_counts. However, I couldn't find the configurations for max_model_len, max_num_seqs, and mm_countsin theconfig.json` file. How should I adjust these settings to avoid this warning? Thank you very much!
The text was updated successfully, but these errors were encountered:
Hi, thanks for your interest in the Qwen model! This warning appears during the VLLM profile_run. In the original code, we added +1 to the video's num_frames in the dummy_data step to avoid having an odd number of frames. This resulted in generating more tokens than allowed (the context length ). This issue has been fixed in the latest VLLM code! Check out the details in this PR. This warning won't affect your actual inference, so no worries there. If it bothers you, you can update to the fixed version.
When I run vllm based on the code example in the readme file on an 8-card A100, the following warning occurs: (VllmWorkerProcess pid=427033) WARNING 02-08 11:44:42 profiling.py:187] The context length (128000) of the model is too short to hold the multi-modal embeddings in the worst case (131072 tokens in total, out of which {'image': 16384, 'video': 114688} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase max_model_len, reduce max_num_seqs, and/or reduce mm_counts. However, I couldn't find the configurations for max_model_len, max_num_seqs, and mm_countsin theconfig.json` file. How should I adjust these settings to avoid this warning? Thank you very much!
The text was updated successfully, but these errors were encountered: