Remove redundant set_active_loras call during warmup #413

SanjuCSudhakaran · 2024-10-22T06:50:31Z

CUDA uses capture for warmup runs and execute_model for actual runs. During each phase they call set_active_loras only once. HPU uses execute_model for both warmup and actual runs. Since execute_model already takes care of set_active_loras internally, the redundant call can be removed.

This special handling is redundant and incorrect, as it causes out-of-bound slicing in decode phase reported in #405.

This PR removes special handling of set_active_loras function call from warmup runs and resolves the issue in #405.

vivekgoe

Looks good to me.

CUDA uses `capture` for warmup runs and `execute_model` for actual runs. During each phase they call `set_active_loras` only once. HPU uses `execute_model` for both warmup and actual runs. Since `execute_model` already takes care of `set_active_loras` internally, the redundant call can be removed. This special handling is redundant and incorrect, as it causes out-of-bound slicing in decode phase reported in HabanaAI#405. This PR removes special handling of `set_active_loras` function call from warmup runs and resolves the issue in HabanaAI#405.

Remove unnecessary special handling for LoRA during warmup

4538c94

SanjuCSudhakaran requested review from vivekgoe and hlahkar October 22, 2024 10:39

vivekgoe approved these changes Oct 22, 2024

View reviewed changes

vivekgoe marked this pull request as ready for review October 22, 2024 11:53

vivekgoe requested a review from michalkuligowski October 22, 2024 11:53

michalkuligowski approved these changes Oct 22, 2024

View reviewed changes

michalkuligowski merged commit 3af4b6c into habana_main Oct 22, 2024
19 checks passed

michalkuligowski deleted the fix-lora-flow branch October 22, 2024 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant set_active_loras call during warmup #413

Remove redundant set_active_loras call during warmup #413

SanjuCSudhakaran commented Oct 22, 2024 •

edited

Loading

vivekgoe left a comment

Remove redundant set_active_loras call during warmup #413

Remove redundant set_active_loras call during warmup #413

Conversation

SanjuCSudhakaran commented Oct 22, 2024 • edited Loading

vivekgoe left a comment

Choose a reason for hiding this comment

SanjuCSudhakaran commented Oct 22, 2024 •

edited

Loading