Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async copying to input preparation #497

Merged
merged 13 commits into from
Nov 18, 2024

Conversation

jkaniecki
Copy link

This PR introduces async copying into _prepare_prompt and _prepare_decode, which makes copying faster.
It also moves precompute_indices_and_offsets funtion into forward to avoid unnecessary H2D copying.

Copy link

@kdamaszk kdamaszk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved
vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved
vllm/worker/hpu_model_runner.py Show resolved Hide resolved
Copy link

@madamczykhabana madamczykhabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@madamczykhabana madamczykhabana merged commit 7c5038c into HabanaAI:habana_main Nov 18, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants