-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Splitwise: prompt and token phase separation #2472
Comments
This was asked in #2370. |
LGTM, I was wondering when can we use it in vllm? |
Hi All, Just wanted to check in and see if there is any update on Splitwise's implementation in vLLM, and if this internal prototype codebase can be released? Thank you! |
This was referenced Feb 8, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We have built the system described in http://aka.ms/splitwise
Splitwise splits the prompt and token phases to run in different servers.
This leverages the differences between these two phases to improve throughput.
We have an internal prototype on top of an internal vLLM branch.
This issue tracks the effort to open source this prototype and make it part of the official vLLM.
This includes:
The text was updated successfully, but these errors were encountered: