Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

Closed
1 task done
FBR65 opened this issue Oct 10, 2024 · 5 comments
Closed
1 task done

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker #9236

FBR65 opened this issue Oct 10, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@FBR65
Copy link

FBR65 commented Oct 10, 2024

Your current environment

I'm using vLLM-Docker latest (0.6.2)

Model Input Dumps

No response

🐛 Describe the bug

INFO 10-10 00:56:44 api_server.py:164] Multiprocessing frontend to use ipc:///tmp/6f288ab9-add1-4cfb-a217-af1687e882b5 for IPC Path.
qwen72-1 | INFO 10-10 00:56:44 api_server.py:177] Started engine process with PID 36
qwen72-1 | Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
qwen72-1 | Traceback (most recent call last):
qwen72-1 | File "", line 198, in _run_module_as_main
qwen72-1 | File "", line 88, in _run_code
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 571, in
qwen72-1 | uvloop.run(run_server(args))
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
qwen72-1 | return __asyncio.run(
qwen72-1 | ^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
qwen72-1 | return runner.run(main)
qwen72-1 | ^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
qwen72-1 | return self._loop.run_until_complete(task)
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
qwen72-1 | return await main
qwen72-1 | ^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 538, in run_server
qwen72-1 | async with build_async_engine_client(args) as engine_client:
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
qwen72-1 | return await anext(self.gen)
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
qwen72-1 | async with build_async_engine_client_from_engine_args(
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
qwen72-1 | return await anext(self.gen)
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 182, in build_async_engine_client_from_engine_args
qwen72-1 | engine_config = engine_args.create_engine_config()
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 874, in create_engine_config
qwen72-1 | model_config = self.create_model_config()
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 811, in create_model_config
qwen72-1 | return ModelConfig(
qwen72-1 | ^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 207, in init
qwen72-1 | self.max_model_len = _get_and_verify_max_len(
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 1746, in _get_and_verify_max_len
qwen72-1 | assert "factor" in rope_scaling
qwen72-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
qwen72-1 | AssertionError

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@FBR65 FBR65 added the bug Something isn't working label Oct 10, 2024
@DarkLight1337
Copy link
Member

Transformers v4.45 has a bug where Qwen2-VL config cannot be loaded correctly. Please either downgrade to vLLM 0.6.1 to use Transformers v4.44, or install vLLM from source to use a patched version of Qwen2-VL config.

@FBR65
Copy link
Author

FBR65 commented Oct 10, 2024

Ah, thanx. I'll downgrade the Image and test.

@FBR65
Copy link
Author

FBR65 commented Oct 10, 2024

Hi,

I downgraded to v0.6.1.post2, then I tried v0.6.1 and this brings up another Error:

ValueError: The checkpoint you are trying to load has model type qwen2_vl but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

So I'll gonna wait till it's fixed in the next Version.

@FBR65 FBR65 closed this as completed Oct 10, 2024
@fiona-lxd
Copy link

+1

@xsoloking
Copy link

Try this docker image soloking/vllm-openai:v0.6.1 or build a image with below dockerfile, it works for me.

FROM docker.io/vllm/vllm-openai:v0.6.1

RUN python3 -m pip install -U git+https://github.com/huggingface/transformers.git@21fac7abba2a37fae86106f87fcf9974fd1e3830

RUN pip install -U flash-attn --no-build-isolation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants