-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Server gets stuck during startup, last logline is 'Using XFormers backend' #4974
Comments
Oh I think I got lucky with knob twiddling. Switching to FROM python:3.11-slim-bookworm
RUN apt-get update && apt-get install --yes python3 python3-distutils clang wget vim
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python3 get-pip.py
RUN python3 -m pip install clang~=10.0.1 # must match version of `clang` installed above.
RUN python3 -m pip install --ignore-installed "vllm==0.4.1" \
"hf-transfer==0.1.6" \
"huggingface_hub==0.22.2" \
"fastapi" \
"httpx"
COPY <<EOF repro.py
import os
EOF
ENV HF_HUB_ENABLE_HF_TRANSFER=1
ENV VLLM_TRACE_FUNCTION=0
ENV VLLM_WORKER_MULTIPROC_METHOD=fork
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server", "--model", "meta-llama/Meta-Llama-3-8B-Instruct", "--tensor-parallel-size", "2"] I may also be the disabling |
|
I turned it on only to debug the stuck issue. But it then became a confounder because it slowed down startup so much, made it harder to distinguish between stuck and merely slow. |
We have added documentation for this situation in #5430. Please take a look. |
Your current environment
🐛 Describe the bug
VLLM is getting stuck on startup, and according to
nvidia-smi
it's before it writes anything to the GPU.I have uploaded the trace file which records up to around
2024-05-22 09:11:22
. At that point the trace shows it looking stuck insympy
code. I tailed the file 10 minutes later and it appeared stuck intorch/_dynamo/allowed_functions.py:322
Logs from
docker
Reproduction:
Hoping just for guidance on what could be going wrong here. I'm not familiar with the code and don't have a clue what could cause the startup to get stuck
The text was updated successfully, but these errors were encountered: