[Frontend] merge beam search implementations #9296

LunrEclipse · 2024-10-11T20:30:52Z

Merged implementation of AsyncEngine and MQLLMEngine's beam_search into EngineClient(Protocol)

Manually testing conducted to verify that requests are still ran in parallel and output is correct.

server side:

$ vllm serve meta-llama/Meta-Llama-3-8B

client side:

Completion

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="key123",
)

prompt = "Capital of France is"

try:
    completion = client.completions.create(
        model="meta-llama/Meta-Llama-3-8B",
        prompt=prompt,
        max_tokens=4,
        extra_body={'use_beam_search': True, 'best_of': 3}
    )
    print(completion.choices[0].text)
except Exception as e:
    print(e)

Chat

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="key123",
)

prompt = "Capital of France is"

try:
    completion = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3-8B",
        messages = [
            {"role": "system", "content": "You are a helpful AI assistant."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=10,
        extra_body={'use_beam_search': True, 'best_of': 3, 'temperature': 0}
    )
    print(completion)
    print(completion.choices[0].message.content)
except Exception as e:
    print(e)

github-actions · 2024-10-11T20:31:18Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

njhill

Thanks @LunrEclipse, this looks good to me. I think along with this we should change EngineClient from a Protocol to an ABC. It doesn't make much sense to have a method impl in an Protocol.

vllm/engine/protocol.py

nFunctor · 2024-10-12T12:33:45Z

vllm/engine/protocol.py

+                            logprob_obj.logprob)
+
+                        if token_id == tokenizer.eos_token_id and \
+                            not ignore_eos:


Since I tried to implement stop logic elsewhere I'd like to know why are we dealing with eos like that instead of putting ignore_eos into sampling params? Strictly speaking this is not the goal of the PR so feel free to ignore this comment. Thanks

@nFunctor yes this PR is just consolidating the existing logic, let's address that in your follow-on one.

youkaichao · 2024-10-12T16:15:26Z

@njhill thanks for shepherding this pr!
@LunrEclipse please address the review from @njhill .

I'll be afk and will hand it over to @njhill for review.

LunrEclipse · 2024-10-12T22:07:54Z

@njhill Thank you for the review! I've gone ahead and pushed changes based on your feedback

njhill

Thanks @LunrEclipse.

Not related to this PR specifically, but couldn't the beam search impl still be kept behind the EngineClient.generate API? I.e. we just intercept the existing beam_search and associated params in SamplingParams ... so that the outward function remains the same?

LunrEclipse · 2024-10-14T18:12:05Z

@njhill Yeah, it's definitely doable if we add logic to the EngineClient.generate methods to check if beam_search is true and then yield different results there than checking inside the engine itself.

russellb · 2024-10-15T13:43:10Z

vllm/engine/protocol.py

+                    cumulative_logprob=beam.cum_logprob,
+                    token_ids=beam.tokens,
+                    index=i,
+                    logprobs=beam.cum_logprob,


@njhill I hadn't got to it yet, but FYI, mypy complains about this line. It's passing a float where it expects a dict, at least according to the typing.

Thanks @russellb yes this looks wrong! Though not really due to this PR which just moved/consolidated the existing logic.

Probably we should keep a logprobs list in BeamSearchSequence in addition to tokens, and set this.

I think this new external beam search impl still needs a bit more work in general.

Yeah, I knew you had just moved the code. I just wanted to highlight it in case it was a super quick fix for you. Thanks for sharing your thoughts! I'll probably get to it at some point as I keep hacking through the type checking. It seems pretty valuable since it's found multiple bugs in my digging so far!

youkaichao · 2024-10-22T08:24:39Z

Thanks @LunrEclipse.

Not related to this PR specifically, but couldn't the beam search impl still be kept behind the EngineClient.generate API? I.e. we just intercept the existing beam_search and associated params in SamplingParams ... so that the outward function remains the same?

it is possible following the spirit of #9302 . we can just create another BeamSearchSequenceGroup class. @LunrEclipse @njhill

njhill · 2024-10-23T00:09:28Z

@youkaichao yes I think we should refactor things a bit.. and also move the impl into beam_search.py instead of protocol.py, etc.

Signed-off-by: charlifu <[email protected]>

Signed-off-by: Alvant <[email protected]>

Signed-off-by: Amit Garg <[email protected]>

Signed-off-by: Sumit Dubey <[email protected]>

Signed-off-by: Maxime Fournioux <[email protected]>

merge beam search implementations

98f3ac8

LunrEclipse marked this pull request as ready for review October 11, 2024 20:30

LunrEclipse changed the title ~~merge beam search implementations~~ [Frontend] merge beam search implementations Oct 11, 2024

njhill reviewed Oct 11, 2024

View reviewed changes

vllm/engine/protocol.py Outdated Show resolved Hide resolved

njhill reviewed Oct 11, 2024

View reviewed changes

vllm/engine/protocol.py Outdated Show resolved Hide resolved

njhill mentioned this pull request Oct 12, 2024

[Frontend, Core] Adding stop and stop_token_ids for beam search. #9264

Closed

nFunctor reviewed Oct 12, 2024

View reviewed changes

feedback

8347de0

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2024

njhill approved these changes Oct 14, 2024

View reviewed changes

simon-mo merged commit 4d31cd4 into vllm-project:main Oct 14, 2024
66 of 69 checks passed

russellb reviewed Oct 15, 2024

View reviewed changes

FerdinandZhong mentioned this pull request Oct 17, 2024

[Frontend] re-enable multi-modality input in the new beam search implementation #9427

Merged

youkaichao deleted the beam-search-merge branch October 22, 2024 08:08

charlifu pushed a commit to charlifu/vllm that referenced this pull request Oct 23, 2024

[Frontend] merge beam search implementations (vllm-project#9296)

bd45a00

Signed-off-by: charlifu <[email protected]>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Frontend] merge beam search implementations (vllm-project#9296)

33ab19d

Signed-off-by: Alvant <[email protected]>

garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024

[Frontend] merge beam search implementations (vllm-project#9296)

4e09adf

Signed-off-by: Amit Garg <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Frontend] merge beam search implementations (vllm-project#9296)

6d33271

Signed-off-by: Sumit Dubey <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Frontend] merge beam search implementations (vllm-project#9296)

2016bfd

mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024

[Frontend] merge beam search implementations (vllm-project#9296)

f8df7d1

Signed-off-by: Maxime Fournioux <[email protected]>

noooop mentioned this pull request Nov 24, 2024

[Usage]: Why use_beam_search is eliminated in vllm.SamplingParams from v0.6.3? #10605

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] merge beam search implementations #9296

[Frontend] merge beam search implementations #9296

LunrEclipse commented Oct 11, 2024

github-actions bot commented Oct 11, 2024

njhill left a comment

nFunctor Oct 12, 2024 •

edited

Loading

njhill Oct 14, 2024

youkaichao commented Oct 12, 2024

LunrEclipse commented Oct 12, 2024

njhill left a comment

LunrEclipse commented Oct 14, 2024

russellb Oct 15, 2024

njhill Oct 15, 2024

russellb Oct 15, 2024

youkaichao commented Oct 22, 2024

njhill commented Oct 23, 2024

[Frontend] merge beam search implementations #9296

[Frontend] merge beam search implementations #9296

Conversation

LunrEclipse commented Oct 11, 2024

github-actions bot commented Oct 11, 2024

njhill left a comment

Choose a reason for hiding this comment

nFunctor Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

njhill Oct 14, 2024

Choose a reason for hiding this comment

youkaichao commented Oct 12, 2024

LunrEclipse commented Oct 12, 2024

njhill left a comment

Choose a reason for hiding this comment

LunrEclipse commented Oct 14, 2024

russellb Oct 15, 2024

Choose a reason for hiding this comment

njhill Oct 15, 2024

Choose a reason for hiding this comment

russellb Oct 15, 2024

Choose a reason for hiding this comment

youkaichao commented Oct 22, 2024

njhill commented Oct 23, 2024

nFunctor Oct 12, 2024 •

edited

Loading