-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups #7528
Comments
Thanks cody! cc @tlrmchlsmth @alexm-neuralmagic @afeldman-nm @varun-sundar-rabindranath |
Additions for tracking. I will take up both of these. cc @zhuohan123
|
I think we can also try making it work with new spmd architecture, which can simplify code and improve performance especially for pp |
|
Is multi-step scheduling not supported with LoRA at all? Does that mean any LoRA requests that come in do not use the scheduling? |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Co-authored with @SolitaryThinker @Yard1 @rkooo567
We are landing multi-step scheduling (#7000) to amortize scheduling overhead for better ITL and throughput. Since the first version of multi-step scheduling doesn't work with some existing features, this issue tracks the progress to support them so that multi-step scheduling could become a common and practical feature in vLLM.
Performance
Chunked Prefill
It is tricky for multi-step scheduling to work with chunked prefill because of the following reasons:
prompt_tokens / chunk_size
steps), which could be much less than the configured multi-steps (i.e., 8).As a result, we need a schedule policy to deal with prefill requests in multi-step scheduling. Here are 2 possible policies we could consider at this moment:
Since there's no single schedule policy that works for all scenarios, it's better to implement both approaches and let users configure. Also we may come up with better policies in the future, we need to make these policies pluggable.
The action items are:
Misc
Functionality
Support prefix caching (should work out of the box but just need to confirm) @comaniac_pythonize_sampler_output
) @afeldman-nm [Core] Logprobs support in Multi-step #7652The text was updated successfully, but these errors were encountered: