Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 #292

kzawora-intel · 2024-09-17T14:06:46Z

After #252, HPUGraph capture takes much less memory, and we can reduce the memory reserved for HPUGraphs. On Llama3.1-8b-Instruct (G2), capturing 100% of prefill and decode graphs on BS=256 now takes 1.566 GB of HBM, which is far less than 40% (~30 GB) we reserve by default. This results in lots of unused (==wasted) memory, which could be used instead for more KV cache blocks.

Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1

ed19acd

kzawora-intel assigned madamczykhabana Sep 17, 2024

michalkuligowski approved these changes Sep 17, 2024

View reviewed changes

michalkuligowski merged commit 47a89be into habana_main Sep 17, 2024
14 checks passed

kzawora-intel unassigned madamczykhabana Sep 17, 2024

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 20, 2024

kzawora-intel deleted the private/kzawora/less_graph_mem branch October 7, 2024 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 #292

Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 #292

kzawora-intel commented Sep 17, 2024 •

edited

Loading

Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 #292

Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 #292

Conversation

kzawora-intel commented Sep 17, 2024 • edited Loading

kzawora-intel commented Sep 17, 2024 •

edited

Loading