[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

FBR65 · 2024-10-10T08:21:41Z

Your current environment

I'm using vLLM-Docker latest (0.6.2)

Model Input Dumps

No response

🐛 Describe the bug

INFO 10-10 00:56:44 api_server.py:164] Multiprocessing frontend to use ipc:///tmp/6f288ab9-add1-4cfb-a217-af1687e882b5 for IPC Path.
qwen72-1 | INFO 10-10 00:56:44 api_server.py:177] Started engine process with PID 36
qwen72-1 | Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
qwen72-1 | Traceback (most recent call last):
qwen72-1 | File "", line 198, in _run_module_as_main
qwen72-1 | File "", line 88, in _run_code
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 571, in
qwen72-1 | uvloop.run(run_server(args))
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
qwen72-1 | return __asyncio.run(
qwen72-1 | ^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
qwen72-1 | return runner.run(main)
qwen72-1 | ^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
qwen72-1 | return self._loop.run_until_complete(task)
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
qwen72-1 | return await main
qwen72-1 | ^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 538, in run_server
qwen72-1 | async with build_async_engine_client(args) as engine_client:
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
qwen72-1 | return await anext(self.gen)
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
qwen72-1 | async with build_async_engine_client_from_engine_args(
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
qwen72-1 | return await anext(self.gen)
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 182, in build_async_engine_client_from_engine_args
qwen72-1 | engine_config = engine_args.create_engine_config()
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 874, in create_engine_config
qwen72-1 | model_config = self.create_model_config()
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 811, in create_model_config
qwen72-1 | return ModelConfig(
qwen72-1 | ^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 207, in init
qwen72-1 | self.max_model_len = _get_and_verify_max_len(
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 1746, in _get_and_verify_max_len
qwen72-1 | assert "factor" in rope_scaling
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | AssertionError

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-10-10T08:28:01Z

Transformers v4.45 has a bug where Qwen2-VL config cannot be loaded correctly. Please either downgrade to vLLM 0.6.1 to use Transformers v4.44, or install vLLM from source to use a patched version of Qwen2-VL config.

FBR65 · 2024-10-10T08:29:42Z

Ah, thanx. I'll downgrade the Image and test.

FBR65 · 2024-10-10T08:54:14Z

Hi,

I downgraded to v0.6.1.post2, then I tried v0.6.1 and this brings up another Error:

ValueError: The checkpoint you are trying to load has model type qwen2_vl but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

So I'll gonna wait till it's fixed in the next Version.

fiona-lxd · 2024-10-12T06:45:33Z

+1

xsoloking · 2024-10-15T01:04:33Z

Try this docker image soloking/vllm-openai:v0.6.1 or build a image with below dockerfile, it works for me.

FROM docker.io/vllm/vllm-openai:v0.6.1

RUN python3 -m pip install -U git+https://github.com/huggingface/transformers.git@21fac7abba2a37fae86106f87fcf9974fd1e3830

RUN pip install -U flash-attn --no-build-isolation

FBR65 added the bug Something isn't working label Oct 10, 2024

FBR65 closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

FBR65 commented Oct 10, 2024

DarkLight1337 commented Oct 10, 2024

FBR65 commented Oct 10, 2024

FBR65 commented Oct 10, 2024

fiona-lxd commented Oct 12, 2024

xsoloking commented Oct 15, 2024

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

Comments

FBR65 commented Oct 10, 2024

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Oct 10, 2024

FBR65 commented Oct 10, 2024

FBR65 commented Oct 10, 2024

fiona-lxd commented Oct 12, 2024

xsoloking commented Oct 15, 2024