Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache #7522

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

RyanJDick
Copy link
Collaborator

@RyanJDick RyanJDick commented Jan 6, 2025

Summary

This is an unplanned fix between PR3 and PR4 in the sequence of partial loading (i.e. low-VRAM) PRs. This PR restores the 'Current Workaround' documented in #7513. In other words, to work around a flaw in the model cache API, this fix allows models to be loaded into VRAM even if they have been dropped from the RAM cache.

This PR also adds an info log each time that this workaround is hit. In a future PR (#7509), we will eliminate the places in the application code that are capable of triggering this condition.

Related Issues / Discussions

QA Instructions

  • Set RAM cache limit to a small value. E.g. ram: 4
  • Run FLUX text-to-image with the full T5 encoder, which exceeds 4GB. This will trigger the error condition.
  • Before the fix, this test configuration would cause a KeyError. After the fix, we should see an info-level log explaining that the condition was hit, but that generation should continue successfully.

Merge Plan

No special instructions.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Jan 6, 2025
@RyanJDick RyanJDick force-pushed the ryan/model-offload-3.5-fix-early-drop branch from cd268ff to c579a21 Compare January 6, 2025 23:03
@RyanJDick RyanJDick marked this pull request as ready for review January 6, 2025 23:15
@psychedelicious psychedelicious self-requested a review January 7, 2025 00:01
@RyanJDick RyanJDick merged commit 782ee7a into main Jan 7, 2025
22 of 29 checks passed
@RyanJDick RyanJDick deleted the ryan/model-offload-3.5-fix-early-drop branch January 7, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants