Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Dynamic SplitFuse #1569

Closed
dongxiaolong opened this issue Nov 6, 2023 · 1 comment
Closed

Support Dynamic SplitFuse #1569

dongxiaolong opened this issue Nov 6, 2023 · 1 comment

Comments

@dongxiaolong
Copy link

https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen#background
In their experiment, DeepSpeed-FastGen outperforms vLLM in both throughput and latency, providing equivalent latency with greater throughput or more responsive latency and the same throughput.
I think the main reason is the dynamic splitFuse method, can vllm support it ?

@WoosukKwon
Copy link
Collaborator

Hi @dongxiaolong, thanks for the proposal. However, it seems this issue is a duplicate of #1562. I will close it to reduce redundancy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants