Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Stable Diffusion memory usage by keeping unet only on GPU. #540

Closed
piEsposito opened this issue Sep 16, 2022 · 7 comments
Closed

Comments

@piEsposito
Copy link
Contributor

piEsposito commented Sep 16, 2022

Is your feature request related to a problem? Please describe.
Stable Diffusion is not compute heavy on all its steps. If we keep the diffusion unet on fp16 on GPU and everything else on CPU, we could reduce the GPU usage to 2.2GB while having a non-so-big impact on performance. It should democratize Stable Diffusion even further.

Only other thing that would need to be done is move the tensors from the devices accordingly, but we can use the models device and dtype attributes to make everything work.

Describe the solution you'd like
I think what I'm proposing on #537 should be enough.

Describe alternatives you've considered
Alternative is to use GPUs for the whole process and pay more for it.

@anton-l
Copy link
Member

anton-l commented Sep 21, 2022

The linked PR is now closed:

The goal with pipelines is to provide simple. readable and easy to modify implementation that can be tweaked by users for their own use cases. Rather than supporting everything in the default pipeline we encourage users to take the code and tweak it the way they want. As there are various strategies to optimize the memory usage, it's best to let users choose what they want and tweak it themselves. So we are not in favor of handling this here.

But there might be a less intrusive way to keep the pipeline balanced between CPU and GPU (maybe using an accelerate wrapper), so I'm keen to keep this issue open for now :)

@piEsposito
Copy link
Contributor Author

piEsposito commented Sep 22, 2022

The linked PR is now closed:

The goal with pipelines is to provide simple. readable and easy to modify implementation that can be tweaked by users for their own use cases. Rather than supporting everything in the default pipeline we encourage users to take the code and tweak it the way they want. As there are various strategies to optimize the memory usage, it's best to let users choose what they want and tweak it themselves. So we are not in favor of handling this here.

But there might be a less intrusive way to keep the pipeline balanced between CPU and GPU (maybe using an accelerate wrapper), so I'm keen to keep this issue open for now :)

@anton-l what if we:

  1. Created a context that will put the tensor on the model device before running it,
  2. A method on Stable Diffusion and other pipelines to balance the models between cpu and GPU and then enable this context on the pipeline call method?

To make it with no intrusion? This could work for all the multiple pipelines with little code changes too.

Would something in that sense work? If yes, I can try opening a PR.

@piEsposito
Copy link
Contributor Author

I've created a feature request on accelerate to enable solving this in a more elegant way. If they let me work on the feature, I can open a PR and then try solving this.

@patrickvonplaten
Copy link
Contributor

Hey @piEsposito,

I'm wondering whether we could maybe try to just write a community pipeline for this: https://github.com/huggingface/diffusers/tree/main/examples/community

@piEsposito
Copy link
Contributor Author

piEsposito commented Oct 7, 2022

@patrickvonplaten I can do that and write the context for moving models between devices within the pipeline. That way we don't get blocked by the issue on accelerate.

How does that sound?

@patrickvonplaten
Copy link
Contributor

Sounds great!

@piEsposito
Copy link
Contributor Author

Closed by #850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants