Support Dynamic SplitFuse #1569

dongxiaolong · 2023-11-06T02:09:27Z

https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen#background
In their experiment, DeepSpeed-FastGen outperforms vLLM in both throughput and latency, providing equivalent latency with greater throughput or more responsive latency and the same throughput.
I think the main reason is the dynamic splitFuse method, can vllm support it ?

WoosukKwon · 2023-11-07T01:40:39Z

Hi @dongxiaolong, thanks for the proposal. However, it seems this issue is a duplicate of #1562. I will close it to reduce redundancy.

WoosukKwon closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Dynamic SplitFuse #1569

Support Dynamic SplitFuse #1569

dongxiaolong commented Nov 6, 2023

WoosukKwon commented Nov 7, 2023

Support Dynamic SplitFuse #1569

Support Dynamic SplitFuse #1569

Comments

dongxiaolong commented Nov 6, 2023

WoosukKwon commented Nov 7, 2023