-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploit vLLM options to return deltas/final-output only #137
Conversation
@dtrifiro if necessary I could update these changes to work both before and after v0.6.1.post2 (like you've done for other things) |
@njhill yeah I think that'd be better, we can bump the minimum vllm version and drop the backward compatibility code after the next vllm version is out |
Now that vllm 0.6.2 is released, @dtrifiro agreed that there's no need for the aforementioned backwards compatibility changes, so this PR should now be ready to merge. |
@njhill it seems this broke streaming generation
See traceback here: https://github.com/opendatahub-io/vllm-tgis-adapter/actions/runs/11062189283/job/30736218398?pr=137#step:8:2614 This can be merged after #143
edit: Already rebased, still failing |
00ac2f1
to
571fa70
Compare
@dtrifiro this one is now fixed and should be working |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #137 +/- ##
=======================================
Coverage 58.34% 58.35%
=======================================
Files 27 27
Lines 1611 1616 +5
Branches 268 270 +2
=======================================
+ Hits 940 943 +3
+ Misses 582 581 -1
- Partials 89 92 +3 ☔ View full report in Codecov by Sentry. |
Remaining test failures appeared to be unrelated to this PR. Probably due to other upstream changes, and we still need to investigate/address of course. |
Yeah looks like those are due to an upstream bug with CPU backend vllm-project/vllm#9024. |
Nontrivial performance benefit, particularly when running with decoupled front-end process. These changes require vLLM >= 0.6.1.post2
a60d6e3
to
fb849d9
Compare
Nontrivial performance benefit, particularly when running with decoupled front-end process.
These changes require vLLM >=
0.6.1.post2