-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster overlap mode scheduler #1738
Conversation
44108b7
to
8685533
Compare
@merrymercy Has this been tested on larger models? I tried the deepseek-v2.5 fp8 version, but it doesn't seem to show much improvement. |
@merrymercy Have you ever tested overlap mode scheduler when receiving requests at a certain request rate rather than sending all the requests at the beginning? |
@ykcombat Did you try it with the latest main branch? If the error is still there, please open a new issue with reproducible instructions. We will fix it very soon if we can reproduce that. |
@merrymercy Thanks for your quick reply! I tried it with the latest main branch but it seems that the error is still there. I have opened a new issue at #2312. |
This PR improves the order of kernel launch and result fetching. Now the overlap scheduler can bring 10% throughput improvement even when radix cache is turned off. When the radix cache is turned on, we can expect more speedup.
Benchmark results
Overlap mode: 51.03 req/s
Normal mode: 46.06 req/s
Notes