GPU consumption #550

David-Lee-1990 · 2023-07-23T14:33:32Z

when i load 13b llama in HF, GPU usage is about 26G.

However, when load 13b llama in vllm, GPU usage is about 73G.

Is this ususal?

trannhatquy · 2023-07-24T04:37:52Z

I load the opt 125M in the vllm api and it takes 21GB on RTX 6000, it's strange.

rahuldshetty · 2023-07-24T09:57:12Z

I load the opt 125M in the vllm api and it takes 21GB on RTX 6000, it's strange.

Observe the same issue with A100 20Gi profile. Using the opt-125 eats up complete GPU memory.
Seems like a case of memory leak ?

irasin · 2023-07-24T10:21:51Z

vllm will allocate 90% GPU memory for model inference and kv_cache blocks. So in A100 case, it will use at least 0.9 * 81920 = 73728 MiB

rahuldshetty · 2023-07-24T10:32:04Z

vllm will allocate 90% GPU memory for model inference and kv_cache blocks. So in A100 case, it will use at least 0.9 * 81920 = 73728 MiB

Any way to limit the GPU memory usage ? (if we can tradeoff between throughput/memory)

irasin · 2023-07-24T10:40:46Z

vllm will allocate 90% GPU memory for model inference and kv_cache blocks. So in A100 case, it will use at least 0.9 * 81920 = 73728 MiB

Any way to limit the GPU memory usage ? (if we can tradeoff between throughput/memory)

You can restrict the gpu usage with the gpu_memory_utilization, https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L27

David-Lee-1990 · 2023-07-24T12:00:10Z

vllm will allocate 90% GPU memory for model inference and kv_cache blocks. So in A100 case, it will use at least 0.9 * 81920 = 73728 MiB

Any way to limit the GPU memory usage ? (if we can tradeoff between throughput/memory)

You can restrict the gpu usage with the gpu_memory_utilization, https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L27

Thanks so much. But what if the needed gpu exceed the predefined gpu_memory_utilization? will it raise OOM or automatically use more gpu memory?

irasin · 2023-07-25T03:35:45Z

Since vllm support continuous batching, it will automatically schedule the requests to run in each iteration, so oom will not happen.
However, if you do something like parallel sampling, you may see oom happen if there is no enough cpu_cache blocks.

zhuohan123 · 2023-08-07T21:19:58Z

Please refer to #241 for memory usage! We will add this to our document.

…-project#550) Signed-off-by: kevin <[email protected]>

zhuohan123 closed this as completed Aug 7, 2023

zhuohan123 mentioned this issue Aug 7, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

thinline72 mentioned this issue Aug 24, 2023

GPU memory requirement very high + tensor parallelization acts weird #823

Closed

sprezz-arthur mentioned this issue Oct 31, 2023

VLLM two GPUs Qwen-7B-Chat consumes more VRAM #1512

Closed

SebastianBodza mentioned this issue Nov 24, 2023

GPU consumption is way higher than expected for quantized model #1770

Closed

ChristineSeven mentioned this issue Nov 27, 2023

Support int8 KVCacheQuant and W8A8 inference in vllm #1112

Closed

6 tasks

leonardxie mentioned this issue Feb 2, 2024

instal from sourece error: compiler_compat/ld: cannot find -lcuda #2720

Closed

rickyyx pushed a commit to rickyyx/vllm that referenced this issue Oct 7, 2024

[ci] Add 1xA10 and 4xA10 CI queues and run Scratch tests on A10 (vllm…

9e2f977

…-project#550) Signed-off-by: kevin <[email protected]>

pi314ever pushed a commit to pi314ever/vllm that referenced this issue Dec 3, 2024

Fix cutlass_fp8_supported flag set on HPU (vllm-project#550)

38c2d10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU consumption #550

GPU consumption #550

David-Lee-1990 commented Jul 23, 2023

trannhatquy commented Jul 24, 2023

rahuldshetty commented Jul 24, 2023

irasin commented Jul 24, 2023 •

edited

Loading

rahuldshetty commented Jul 24, 2023

irasin commented Jul 24, 2023

David-Lee-1990 commented Jul 24, 2023

irasin commented Jul 25, 2023

zhuohan123 commented Aug 7, 2023

GPU consumption #550

GPU consumption #550

Comments

David-Lee-1990 commented Jul 23, 2023

trannhatquy commented Jul 24, 2023

rahuldshetty commented Jul 24, 2023

irasin commented Jul 24, 2023 • edited Loading

rahuldshetty commented Jul 24, 2023

irasin commented Jul 24, 2023

David-Lee-1990 commented Jul 24, 2023

irasin commented Jul 25, 2023

zhuohan123 commented Aug 7, 2023

irasin commented Jul 24, 2023 •

edited

Loading