-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] merge beam search implementations #9296
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LunrEclipse, this looks good to me. I think along with this we should change EngineClient
from a Protocol
to an ABC
. It doesn't make much sense to have a method impl in an Protocol
.
logprob_obj.logprob) | ||
|
||
if token_id == tokenizer.eos_token_id and \ | ||
not ignore_eos: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I tried to implement stop logic elsewhere I'd like to know why are we dealing with eos
like that instead of putting ignore_eos
into sampling params? Strictly speaking this is not the goal of the PR so feel free to ignore this comment. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nFunctor yes this PR is just consolidating the existing logic, let's address that in your follow-on one.
@njhill thanks for shepherding this pr! I'll be afk and will hand it over to @njhill for review. |
@njhill Thank you for the review! I've gone ahead and pushed changes based on your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LunrEclipse.
Not related to this PR specifically, but couldn't the beam search impl still be kept behind the EngineClient.generate
API? I.e. we just intercept the existing beam_search
and associated params in SamplingParams
... so that the outward function remains the same?
@njhill Yeah, it's definitely doable if we add logic to the |
cumulative_logprob=beam.cum_logprob, | ||
token_ids=beam.tokens, | ||
index=i, | ||
logprobs=beam.cum_logprob, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill I hadn't got to it yet, but FYI, mypy
complains about this line. It's passing a float
where it expects a dict
, at least according to the typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @russellb yes this looks wrong! Though not really due to this PR which just moved/consolidated the existing logic.
Probably we should keep a logprobs list in BeamSearchSequence
in addition to tokens, and set this.
I think this new external beam search impl still needs a bit more work in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I knew you had just moved the code. I just wanted to highlight it in case it was a super quick fix for you. Thanks for sharing your thoughts! I'll probably get to it at some point as I keep hacking through the type checking. It seems pretty valuable since it's found multiple bugs in my digging so far!
it is possible following the spirit of #9302 . we can just create another |
@youkaichao yes I think we should refactor things a bit.. and also move the impl into beam_search.py instead of protocol.py, etc. |
Signed-off-by: charlifu <[email protected]>
Signed-off-by: Alvant <[email protected]>
Signed-off-by: Amit Garg <[email protected]>
Signed-off-by: Sumit Dubey <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
Merged implementation of
AsyncEngine
andMQLLMEngine
'sbeam_search
intoEngineClient(Protocol)
Manually testing conducted to verify that requests are still ran in parallel and output is correct.
server side:
client side:
Completion
Chat