Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bucketing overhaul 1/n] Add padding-aware scheduling and option to limit prefill batch size #394

Merged
merged 5 commits into from
Oct 17, 2024

Conversation

kzawora-intel
Copy link

@kzawora-intel kzawora-intel commented Oct 15, 2024

This PR adds following functionality that can be enabled via engine flags:

  • use_padding_aware_scheduling - vLLM scheduler will now calculate token cost considering padded prefill shape (similar to schedule prefills considering padded shape #109).
  • max_num_prefill_seqs - padding-aware scheduler will perform an additional check for prefill batch size and will effectively limit prefill batch size at maximum of max_num_prefill_seqs. If unset, max prefill batch size will be max_num_seqs.
    Both features are generic and do not require HPU, although they may be specialized for particular vendor's usage. Padding aware scheduling includes padding function selector which selects HPU padding function (considering currently used HPU buckets) if current device is HPU. Otherwise, it will take a product of batch_size x max_seq_len.

vllm/core/scheduler.py Show resolved Hide resolved
vllm/core/scheduler.py Outdated Show resolved Hide resolved
vllm/worker/hpu_model_runner.py Show resolved Hide resolved
@kzawora-intel kzawora-intel changed the title Add padding-aware scheduling and option to limit prefill batch size [bucketing overhaul 1/n] Add padding-aware scheduling and option to limit prefill batch size Oct 15, 2024
Copy link

@kdamaszk kdamaszk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kzawora-intel kzawora-intel merged commit 05bcdf5 into habana_main Oct 17, 2024
19 checks passed
xuechendi added a commit to xuechendi/vllm-fork that referenced this pull request Oct 23, 2024
…ion to limit prefill batch size (HabanaAI#394)"

This reverts commit 05bcdf5.
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants