[FEATURE] Implement Dynamic SplitFuse #1562

casper-hansen · 2023-11-04T14:06:52Z

Dear vLLM maintainers @WoosukKwon and @zhuohan123 (@Yard1),

DeepSpeed has released its serving framework which claims to be faster than vLLM. The main speedup comes from Dynamic SplitFuse which is a technique that does the following:

Long prompts are decomposed into much smaller chunks and scheduled across multiple forward passes (iterations) with only the final pass performing any generation.
Short prompts will be composed to exactly fill a target token budget. Even short prompts may be decomposed to ensure the budget is precisely met and the forward sizes are well-aligned.

Code: https://github.com/microsoft/DeepSpeed-MII
Background: https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen

Llama 13B (1x A100-80GB):

Llama 70B (4x A100x80GB with TP):

irasin · 2023-11-14T13:06:59Z

LGTM

thesues · 2023-12-20T02:34:48Z

Hi, is there any progress right now?

shixianc · 2024-01-07T18:57:04Z

Do we have an ETA? 😊

tdene · 2024-02-20T06:49:51Z

Hi @WoosukKwon @zhuohan123

The absence of a chunked prefill implementation in vllm is a major blocker. Any kind of timeline or regular communication on progress towards a chunked prefill implementation would be immensely helpful, just to allow for future planning.

sh1ng · 2024-02-29T22:37:11Z

Keeping a batch with aligned length definitely helps #2357

njhill · 2024-02-29T23:51:02Z

Looks like someone has started working on this: #3106

hmellor · 2024-07-26T10:25:27Z

Chunked prefill is now supported

WoosukKwon added the enhancement New feature or request label Nov 7, 2023

WoosukKwon mentioned this issue Nov 7, 2023

Support Dynamic SplitFuse #1569

Closed

WoosukKwon added the performance Performance-related issues label Nov 9, 2023

zhuohan123 mentioned this issue Jan 31, 2024

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed

30 tasks

sh1ng mentioned this issue Mar 7, 2024

Scheduler policy to maximize throughput #2357

Open

hmellor added feature request and removed enhancement New feature or request labels Mar 15, 2024

hmellor closed this as completed Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Implement Dynamic SplitFuse #1562

[FEATURE] Implement Dynamic SplitFuse #1562

casper-hansen commented Nov 4, 2023

irasin commented Nov 14, 2023

thesues commented Dec 20, 2023

shixianc commented Jan 7, 2024

tdene commented Feb 20, 2024 •

edited

Loading

sh1ng commented Feb 29, 2024

njhill commented Feb 29, 2024

hmellor commented Jul 26, 2024

[FEATURE] Implement Dynamic SplitFuse #1562

[FEATURE] Implement Dynamic SplitFuse #1562

Comments

casper-hansen commented Nov 4, 2023

irasin commented Nov 14, 2023

thesues commented Dec 20, 2023

shixianc commented Jan 7, 2024

tdene commented Feb 20, 2024 • edited Loading

sh1ng commented Feb 29, 2024

njhill commented Feb 29, 2024

hmellor commented Jul 26, 2024

tdene commented Feb 20, 2024 •

edited

Loading