AssertionError: data parallel group is already initialized #549

David-Lee-1990 · 2023-07-23T08:53:54Z

Hi, when doing inference on a single-gpu, i encountered this assertion error.

It happens when runing at vllm/model_executor/parallel_utils/parallel_state.py. I do not know why vllm need to init_distributed_environment while i only use single-gpu.

David-Lee-1990 · 2023-07-23T14:09:38Z

Solved !

I initialized two LLM models for different service.

SparshRastogi · 2023-08-15T10:11:23Z

Not working that way in google colab? Which platform you used for coding?

mzeidhassan · 2023-08-18T06:27:31Z

I am having the same issue. I work in a virtual environment (venv) in WSL2. How can I resolve this issue?

YuamLu · 2023-08-21T15:25:11Z

@mzeidhassan
I've just got the same problem. Go to source code and commenting out all line which has "is already initialized".
It finally works !😊

NandaKishoreJoshi · 2023-08-26T07:30:29Z

@YuamLu , I'm also getting the same error while using vLLm on langchain. I'm running on Azure Ubuntu Virtual Machine

from langchain.llms import VLLM
llm = VLLM(model="meta-llama/Llama-2-70b-chat-hf",
trust_remote_code=True, # mandatory for hf models
max_new_tokens=128,
top_k=10,
top_p=0.95,
temperature=0.8,
use_auth_token=True
)

Error :

ValidationError: 1 validation error for VLLM
root
data parallel group is already initialized (type=assertion_error)

Please let me know how you solved in more detail

YuamLu · 2023-08-26T07:37:51Z

@NandaKishoreJoshi
Hi,

I've updated a pull #817 , you can try on my code.

If you still have error, paste the full error message to here, i'll try my best to solve it.

NandaKishoreJoshi · 2023-08-26T08:26:35Z

@YuamLu ,
I tried your pull code and its working. Thank you

saattrupdan · 2024-01-10T12:42:11Z

I found a fix in my own use case, which does not involve changing the source code:

from vllm import LLM
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Initialise a vLLM model for the first time
model = LLM(model="test-model-name", trust_remote_code=True)

# This vLLM function resets the global variables, which enables initialising models
destroy_model_parallel()

# Re-initialise a new vLLM model
model = LLM(model="test-model-name", trust_remote_code=True)

Hope that helps others 🙂

Anindyadeep · 2024-01-27T04:20:43Z

I found a fix in my own use case, which does not involve changing the source code:

from vllm import LLM
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Initialise a vLLM model for the first time
model = LLM(model="test-model-name", trust_remote_code=True)

# This vLLM function resets the global variables, which enables initialising models
destroy_model_parallel()

# Re-initialise a new vLLM model
model = LLM(model="test-model-name", trust_remote_code=True)

Hope that helps others 🙂

Thanks for the quick solution. Really helpful, I additionally got a CUDA OOM Error (classic). So I thought to add an extended solution. Hope it helps too.

# Add the same code shown by @saattrupdan 

import torch
from vllm import LLM
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Initialise a vLLM model for the first time
model = LLM(model="test-model-name", trust_remote_code=True)

# This vLLM function resets the global variables, which enables initializing models
destroy_model_parallel()

# If you face CUDA OOM Error, then delete all the left over queued operations

del model
torch.cuda.synchronize() 

# Now, Re-initialise a new vLLM model
model = LLM(model="test-model-name", trust_remote_code=True)

merge Upstream 07-17

Fixes issue with multi LoRA during `profile_run`.

ranjancse26 mentioned this issue Sep 27, 2023

[Bug]: vllm runtime issues BerriAI/litellm#431

Closed

vkehfdl1 mentioned this issue Feb 8, 2024

Add __del__ at Vllm for deleting model run-llama/llama_index#10524

Closed

8 tasks

hmellor closed this as completed Apr 4, 2024

rickyyx pushed a commit to rickyyx/vllm that referenced this issue Oct 7, 2024

Merge pull request vllm-project#549 from anyscale/upstream-07-17

f1d9e75

merge Upstream 07-17

pi314ever pushed a commit to pi314ever/vllm that referenced this issue Dec 3, 2024

Fix profile run for multi LoRA (vllm-project#549)

0f513bd

Fixes issue with multi LoRA during `profile_run`.

alexhegit mentioned this issue Jan 23, 2025

Run LLMs MTP test in list always failed at the second LLM alexhegit/vLLM_ModelCoverageTest#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: data parallel group is already initialized #549

AssertionError: data parallel group is already initialized #549

David-Lee-1990 commented Jul 23, 2023 •

edited

Loading

David-Lee-1990 commented Jul 23, 2023

SparshRastogi commented Aug 15, 2023 •

edited

Loading

mzeidhassan commented Aug 18, 2023

YuamLu commented Aug 21, 2023

NandaKishoreJoshi commented Aug 26, 2023

YuamLu commented Aug 26, 2023

NandaKishoreJoshi commented Aug 26, 2023

saattrupdan commented Jan 10, 2024

Anindyadeep commented Jan 27, 2024

AssertionError: data parallel group is already initialized #549

AssertionError: data parallel group is already initialized #549

Comments

David-Lee-1990 commented Jul 23, 2023 • edited Loading

David-Lee-1990 commented Jul 23, 2023

SparshRastogi commented Aug 15, 2023 • edited Loading

mzeidhassan commented Aug 18, 2023

YuamLu commented Aug 21, 2023

NandaKishoreJoshi commented Aug 26, 2023

YuamLu commented Aug 26, 2023

NandaKishoreJoshi commented Aug 26, 2023

saattrupdan commented Jan 10, 2024

Anindyadeep commented Jan 27, 2024

David-Lee-1990 commented Jul 23, 2023 •

edited

Loading

SparshRastogi commented Aug 15, 2023 •

edited

Loading