Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache #7522

RyanJDick · 2025-01-06T23:01:25Z

Summary

This is an unplanned fix between PR3 and PR4 in the sequence of partial loading (i.e. low-VRAM) PRs. This PR restores the 'Current Workaround' documented in #7513. In other words, to work around a flaw in the model cache API, this fix allows models to be loaded into VRAM even if they have been dropped from the RAM cache.

This PR also adds an info log each time that this workaround is hit. In a future PR (#7509), we will eliminate the places in the application code that are capable of triggering this condition.

Related Issues / Discussions

QA Instructions

Set RAM cache limit to a small value. E.g. ram: 4
Run FLUX text-to-image with the full T5 encoder, which exceeds 4GB. This will trigger the error condition.
Before the fix, this test configuration would cause a KeyError. After the fix, we should see an info-level log explaining that the condition was hit, but that generation should continue successfully.

Merge Plan

No special instructions.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

…m the RAM cache (related: #7513).

github-actions bot added python PRs that change python files backend PRs that change backend files labels Jan 6, 2025

Allow models to be locked in VRAM, even if they have been dropped fro…

c579a21

…m the RAM cache (related: #7513).

RyanJDick force-pushed the ryan/model-offload-3.5-fix-early-drop branch from cd268ff to c579a21 Compare January 6, 2025 23:03

RyanJDick marked this pull request as ready for review January 6, 2025 23:15

RyanJDick requested review from lstein, blessedcoolant, brandonrising and hipsterusername as code owners January 6, 2025 23:15

hipsterusername approved these changes Jan 6, 2025

View reviewed changes

psychedelicious self-requested a review January 7, 2025 00:01

psychedelicious approved these changes Jan 7, 2025

View reviewed changes

RyanJDick merged commit 782ee7a into main Jan 7, 2025
22 of 29 checks passed

RyanJDick deleted the ryan/model-offload-3.5-fix-early-drop branch January 7, 2025 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache #7522

Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache #7522

RyanJDick commented Jan 6, 2025 •

edited

Loading

Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache #7522

Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache #7522

Conversation

RyanJDick commented Jan 6, 2025 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

RyanJDick commented Jan 6, 2025 •

edited

Loading