Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully recover from VRAM out of memory errors (next branch version) #5794

Merged
merged 4 commits into from
Feb 26, 2024

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Feb 24, 2024

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because: straightforward fix

Have you updated all relevant documentation?

  • Yes
  • No

Description

At least on my system, if the model manager runs out of VRAM while moving a model into the GPU, the partial model gets stuck in VRAM and can't easily be removed. This makes the model unusable, and uses precious VRAM.

I encountered this when playing with large language models on the same system, but I suspect it will also happen if a video game is being played. I tried various approaches to recover from this state, including clearing the vram cache, deleting the model object, and running garbage collection, but without success.

This PR avoids the issue by implementing a check for sufficient available VRAM before trying to move a model to a CUDA GPU. If there is insufficient room, it raises a torch.cuda.OutOfMemoryError. This message is propagated to the front end. If more VRAM becomes available later, invocations will begin to work again.

Note: This pull request is against next. The model manager code has changed a bit, so I'm making a separate PR for main.

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

Launch InvokeAI web service and another application that uses a lot of GPU VRAM. For my testing, I used ollama with a large model loaded. Run a generation and see if it generates an out of memory error. Try this repeatedly - should get the same error each time. Now kill the other application to free up VRAM and try to generate an image. It should work!

Merge Plan

Can merge when approved.

Added/updated tests?

  • Yes
  • No : please replace this line with details on why tests
    have not been included

[optional] Are there any post deployment tasks we need to perform?

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Feb 24, 2024
@lstein lstein changed the title Bugfix/model manager2/out of memory handling Gracefully recover from VRAM out of memory errors (next branch version) Feb 24, 2024
@psychedelicious psychedelicious merged commit 3ccb4e6 into next Feb 26, 2024
7 of 8 checks passed
@psychedelicious psychedelicious deleted the bugfix/model-manager2/out-of-memory-handling branch February 26, 2024 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants