[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor #12167

kzawora-intel · 2025-01-17T14:23:52Z

#11256 changed the input broadcasting flow a little bit, and in tensor parallel scenario we cannot assume that all workers will receive execute_model_req from the executor. Since broadcast from driver worker is now performed within LocalOrDistributedWorkerBase.prepare_input called by LocalOrDistributedWorkerBase.execute_model, for non-driver workers, we can expect execute_model_req to be None within HPUWorker.execute_model (which is essentially a wrapper around LocalOrDistributedWorkerBase.execute_model with some additional HPU-specific profiling sugar). Asserting that execute_model_req is not None breaks the tensor parallelism, and this PR fixes that. Additionally, it enables usage of multiprocessing executor on HPU, as it's fully functional on HPU (as introduced in #11030).

Signed-off-by: Konrad Zawora <[email protected]>

github-actions · 2025-01-17T14:24:04Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Konrad Zawora <[email protected]>

vllm/executor/multiproc_worker_utils.py

Signed-off-by: zhenwei <[email protected]>

Signed-off-by: Konrad Zawora <[email protected]>

WoosukKwon

LGTM. @youkaichao Can you take a second look?

youkaichao

LGTM, thanks for the fix!

Signed-off-by: Konrad Zawora <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

Signed-off-by: Konrad Zawora <[email protected]>

Signed-off-by: Konrad Zawora <[email protected]> Signed-off-by: Isotr0py <[email protected]>

Fix HPU tensor parallelism

0ecfe56

Signed-off-by: Konrad Zawora <[email protected]>

oh shoot! i missed some brackets!

68f1a43

Signed-off-by: Konrad Zawora <[email protected]>

kzawora-intel force-pushed the private/kzawora/tp_upstream_fix branch from 46d59ea to 68f1a43 Compare January 17, 2025 14:33

youkaichao reviewed Jan 21, 2025

View reviewed changes

vllm/executor/multiproc_worker_utils.py Outdated Show resolved Hide resolved

zhenwei-intel added a commit to zhenwei-intel/vllm that referenced this pull request Jan 22, 2025

cherry-pick Konrad's bugfix vllm-project#12167

65afb1f

Signed-off-by: zhenwei <[email protected]>

kzawora-intel added 4 commits January 22, 2025 16:14

move mp method checks to hpu.py

d8e9efc

Signed-off-by: Konrad Zawora <[email protected]>

Merge remote-tracking branch 'upstream/main' into HEAD

d9384b0

update --distributed-executor-backend help

db124cb

Signed-off-by: Konrad Zawora <[email protected]>

remove unnecessary imports

a081b74

Signed-off-by: Konrad Zawora <[email protected]>

WoosukKwon requested a review from youkaichao January 22, 2025 16:36

WoosukKwon approved these changes Jan 22, 2025

View reviewed changes

youkaichao approved these changes Jan 22, 2025

View reviewed changes

youkaichao enabled auto-merge (squash) January 22, 2025 17:25

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 22, 2025

youkaichao disabled auto-merge January 22, 2025 18:06

youkaichao merged commit 96f6a75 into vllm-project:main Jan 22, 2025
33 of 40 checks passed

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

af9a034

Signed-off-by: Konrad Zawora <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

e5e57fa

Signed-off-by: Konrad Zawora <[email protected]>

tjtanaa pushed a commit to EmbeddedLLM/vllm that referenced this pull request Jan 27, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

366b24b

Signed-off-by: Konrad Zawora <[email protected]>

tjtanaa pushed a commit to EmbeddedLLM/vllm that referenced this pull request Jan 27, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

aac3494

Signed-off-by: Konrad Zawora <[email protected]>

tjtanaa pushed a commit to EmbeddedLLM/vllm that referenced this pull request Jan 28, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

851e8a9

Signed-off-by: Konrad Zawora <[email protected]>

rasmith pushed a commit to rasmith/vllm that referenced this pull request Jan 30, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

7e433fe

Signed-off-by: Konrad Zawora <[email protected]>

Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Feb 2, 2025

[Bugfix] Fix HPU multiprocessing executor (vllm-project#12167)

6c8b79a

Signed-off-by: Konrad Zawora <[email protected]> Signed-off-by: Isotr0py <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor #12167

[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor #12167

kzawora-intel commented Jan 17, 2025 •

edited

Loading

github-actions bot commented Jan 17, 2025

WoosukKwon left a comment

youkaichao left a comment

[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor #12167

[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor #12167

Conversation

kzawora-intel commented Jan 17, 2025 • edited Loading

github-actions bot commented Jan 17, 2025

WoosukKwon left a comment

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment

kzawora-intel commented Jan 17, 2025 •

edited

Loading