Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant set_active_loras call during warmup #413

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

SanjuCSudhakaran
Copy link

@SanjuCSudhakaran SanjuCSudhakaran commented Oct 22, 2024

CUDA uses capture for warmup runs and execute_model for actual runs. During each phase they call set_active_loras only once. HPU uses execute_model for both warmup and actual runs. Since execute_model already takes care of set_active_loras internally, the redundant call can be removed.

This special handling is redundant and incorrect, as it causes out-of-bound slicing in decode phase reported in #405.

This PR removes special handling of set_active_loras function call from warmup runs and resolves the issue in #405.

Copy link

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@vivekgoe vivekgoe marked this pull request as ready for review October 22, 2024 11:53
@michalkuligowski michalkuligowski merged commit 3af4b6c into habana_main Oct 22, 2024
19 checks passed
@michalkuligowski michalkuligowski deleted the fix-lora-flow branch October 22, 2024 13:34
xuechendi pushed a commit to xuechendi/vllm-fork that referenced this pull request Oct 23, 2024
CUDA uses `capture` for warmup runs and `execute_model` for actual runs.
During each phase they call `set_active_loras` only once. HPU uses
`execute_model` for both warmup and actual runs. Since `execute_model`
already takes care of `set_active_loras` internally, the redundant call
can be removed.

This special handling is redundant and incorrect, as it causes
out-of-bound slicing in decode phase reported in
HabanaAI#405.

This PR removes special handling of `set_active_loras` function call
from warmup runs and resolves the issue in
HabanaAI#405.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants