Deadlock detected #339

flexwang · 2023-12-03T21:08:07Z

Deadlock detected. Resetting KV cache and recomputing requests. Consider limiting number of concurrent requests or decreasing max lengths of prompts/generations.

Constantly see this issue when running below on a A100(40GiB) for llama2-7b.

    import mii
    from deepspeed.inference import RaggedInferenceEngineConfig, DeepSpeedTPConfig
    from deepspeed.inference.v2.ragged import DSStateManagerConfig

    tp_config = DeepSpeedTPConfig(tp_size=tensor_parallel)
    mgr_config = DSStateManagerConfig(max_ragged_batch_size=1024,
                                      max_ragged_sequence_count=1024)
    inference_config = RaggedInferenceEngineConfig(tensor_parallel=tp_config,
                                                   state_manager=mgr_config)
    llm = mii.serve(
        model,
        deployment_name='mii',
        tensor_parallel=tensor_parallel,
        inference_engine_config=inference_config,
        replica_num=1,
        task='text-generation'
    )
    outputs = llm.generate(prompts,
                           do_sample=False,
                           top_p=1.0,
                           max_new_tokens=max_new_tokens)

The text was updated successfully, but these errors were encountered:

mrwyattii · 2023-12-04T17:26:55Z

The deadlock is caused when we detect that we are not making any progress on any of the generation tasks. This can happen for a few reasons, including lots of concurrent generation requests, very long sequences, or limited GPU memory. Our current solution for this will hurt performance if you are seeing it often. How many requests are you sending to the server at once time?

Also, I believe @tohtana is working on an improved solution to this problem.

flexwang · 2023-12-04T18:00:17Z

I am sending a few hundred requests within one batch.

mrwyattii · 2023-12-05T21:05:55Z

I am sending a few hundred requests within one batch.

If these requests are generating lots of tokens, then sending this many at once will definitely cause the deadlock situation. If you can send the requests in smaller batches, that would avoid the problem. However, I will let @tohtana comment on any upcoming changes that will allow users to send large batches of requests at once!

tohtana · 2023-12-06T00:43:21Z

Hi @flexwang,
DeepSpeed-FastGet (MII) allocates KV cache for all requests that are processed in a batch. To avoid this warning, a simple workaround is to reduce the number of requests in a batch. In your case, I recommend starting with 10-20 requests, though the optimal number heavily depends on the lengths of the prompts and the generated tokens. If you don't encounter the warning message, you may be able to further enhance efficiency by gradually increasing the number of requests.

We understand that tuning the number of requests isn't always straightforward, and we're considering either automating this adjustment or at least making it easier in future versions.

Tan-YiFan · 2023-12-19T12:43:06Z

vLLM implements swapping (Chapter 4.5 of vLLM paper) as an alternative to recomputing if no space could be allocated for KV cache of new tokens. Would MII implement KV cache swapping?

canamika27 · 2024-02-23T16:06:50Z

Hi, any update on this issue.

I am also getting same issue even with a batch size 1 (using 2 x A100 80GB) but when I am using a single A100 80GB I am able to run even with higher batches

tohtana · 2024-02-23T22:10:10Z

@canamika27 I think #403 resolved the issue. Can you try the latest version?

canamika27 · 2024-02-26T06:24:26Z

@tohtana -- Thanks !! Deadlock issue is solved with latest Deepspeed version but again I got a new error :
assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache"

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The batch size is 1
The batch size is 1
The total time is 3.23 secs
The batch size is 1
The total time is 4.21 secs
The batch size is 1
The total time is 3.15 secs
The batch size is 1
The total time is 3.15 secs
The batch size is 1
Traceback (most recent call last):
File "/home/AutoAWQ/Digi_human/TP_DP/Deepspeed/test_deepspeed.py", line 37, in
response = pipe(prompts, max_new_tokens=256)
File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 550, in call
self.schedule_requests()
File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 334, in schedule_requests
self.reset_request_status()
File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 359, in reset_request_status
assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache"
AssertionError: Function to clear the KV cache is invoked, but no request consumes KV cache
[2024-02-26 00:11:22,782] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766664
[2024-02-26 00:11:25,008] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766665

One Observation from my end : I am currently using 2 x A100 80GB systems & my prompts are approx 1000-2000 tokens, so when I am reducing my prompt length till 200 tokens it seems to be working with batch size 1 but not for higher batches. It seems we cannot run large prompts. This issue is happening only when I am using 2 gpus , with 1 GPU I am able to run with big batches & long prompts.

zoyopei · 2024-03-19T03:50:59Z

@tohtana -- Thanks !! Deadlock issue is solved with latest Deepspeed version but again I got a new error : assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache"

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. The batch size is 1 The batch size is 1 The total time is 3.23 secs The batch size is 1 The total time is 4.21 secs The batch size is 1 The total time is 3.15 secs The batch size is 1 The total time is 3.15 secs The batch size is 1 Traceback (most recent call last): File "/home/AutoAWQ/Digi_human/TP_DP/Deepspeed/test_deepspeed.py", line 37, in response = pipe(prompts, max_new_tokens=256) File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 550, in call self.schedule_requests() File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 334, in schedule_requests self.reset_request_status() File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 359, in reset_request_status assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache" AssertionError: Function to clear the KV cache is invoked, but no request consumes KV cache [2024-02-26 00:11:22,782] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766664 [2024-02-26 00:11:25,008] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766665

One Observation from my end : I am currently using 2 x A100 80GB systems & my prompts are approx 1000-2000 tokens, so when I am reducing my prompt length till 200 tokens it seems to be working with batch size 1 but not for higher batches. It seems we cannot run large prompts. This issue is happening only when I am using 2 gpus , with 1 GPU I am able to run with big batches & long prompts.

I got the same error.

geoyg · 2024-03-22T02:27:39Z

@tohtana -- Thanks !! Deadlock issue is solved with latest Deepspeed version but again I got a new error : assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache"

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. The batch size is 1 The batch size is 1 The total time is 3.23 secs The batch size is 1 The total time is 4.21 secs The batch size is 1 The total time is 3.15 secs The batch size is 1 The total time is 3.15 secs The batch size is 1 Traceback (most recent call last): File "/home/AutoAWQ/Digi_human/TP_DP/Deepspeed/test_deepspeed.py", line 37, in response = pipe(prompts, max_new_tokens=256) File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 550, in call self.schedule_requests() File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 334, in schedule_requests self.reset_request_status() File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 359, in reset_request_status assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache" AssertionError: Function to clear the KV cache is invoked, but no request consumes KV cache [2024-02-26 00:11:22,782] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766664 [2024-02-26 00:11:25,008] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766665

One Observation from my end : I am currently using 2 x A100 80GB systems & my prompts are approx 1000-2000 tokens, so when I am reducing my prompt length till 200 tokens it seems to be working with batch size 1 but not for higher batches. It seems we cannot run large prompts. This issue is happening only when I am using 2 gpus , with 1 GPU I am able to run with big batches & long prompts.

same error

aspiridon0v · 2024-03-23T11:13:29Z

@tohtana -- Thanks !! Deadlock issue is solved with latest Deepspeed version but again I got a new error : assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache"

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. The batch size is 1 The batch size is 1 The total time is 3.23 secs The batch size is 1 The total time is 4.21 secs The batch size is 1 The total time is 3.15 secs The batch size is 1 The total time is 3.15 secs The batch size is 1 Traceback (most recent call last): File "/home/AutoAWQ/Digi_human/TP_DP/Deepspeed/test_deepspeed.py", line 37, in response = pipe(prompts, max_new_tokens=256) File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 550, in call self.schedule_requests() File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 334, in schedule_requests self.reset_request_status() File "/home/anaconda3/envs/mlc/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 359, in reset_request_status assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache" AssertionError: Function to clear the KV cache is invoked, but no request consumes KV cache [2024-02-26 00:11:22,782] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766664 [2024-02-26 00:11:25,008] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 766665

One Observation from my end : I am currently using 2 x A100 80GB systems & my prompts are approx 1000-2000 tokens, so when I am reducing my prompt length till 200 tokens it seems to be working with batch size 1 but not for higher batches. It seems we cannot run large prompts. This issue is happening only when I am using 2 gpus , with 1 GPU I am able to run with big batches & long prompts.

I have the same problem.

prabin525 · 2024-03-27T21:57:59Z

Any Update on this? I am also getting the same error.

seven-mile · 2024-07-30T02:55:30Z

Any workaround for the new problem? @arashb Sorry for ping, can you help?

I just want to do inference serially, but got this error after exactly 3 pipeline calls. Stably repro with mixtral8x7b on two machines.

Related issue: #497

mrwyattii mentioned this issue Dec 19, 2023

Reproduced readme results #356

Open

mrwyattii mentioned this issue Jan 3, 2024

for loop calling Non Persistent Pipeline will cause Deadlock #365

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock detected #339

Deadlock detected #339

flexwang commented Dec 3, 2023

mrwyattii commented Dec 4, 2023

flexwang commented Dec 4, 2023

mrwyattii commented Dec 5, 2023

tohtana commented Dec 6, 2023

Tan-YiFan commented Dec 19, 2023

canamika27 commented Feb 23, 2024

tohtana commented Feb 23, 2024

canamika27 commented Feb 26, 2024 •

edited

Loading

zoyopei commented Mar 19, 2024

geoyg commented Mar 22, 2024

aspiridon0v commented Mar 23, 2024

prabin525 commented Mar 27, 2024

seven-mile commented Jul 30, 2024 •

edited

Loading

Deadlock detected #339

Deadlock detected #339

Comments

flexwang commented Dec 3, 2023

mrwyattii commented Dec 4, 2023

flexwang commented Dec 4, 2023

mrwyattii commented Dec 5, 2023

tohtana commented Dec 6, 2023

Tan-YiFan commented Dec 19, 2023

canamika27 commented Feb 23, 2024

tohtana commented Feb 23, 2024

canamika27 commented Feb 26, 2024 • edited Loading

zoyopei commented Mar 19, 2024

geoyg commented Mar 22, 2024

aspiridon0v commented Mar 23, 2024

prabin525 commented Mar 27, 2024

seven-mile commented Jul 30, 2024 • edited Loading

canamika27 commented Feb 26, 2024 •

edited

Loading

seven-mile commented Jul 30, 2024 •

edited

Loading