Reduce Stable Diffusion memory usage by keeping unet only on GPU. #540

piEsposito · 2022-09-16T18:12:59Z

Is your feature request related to a problem? Please describe.
Stable Diffusion is not compute heavy on all its steps. If we keep the diffusion unet on fp16 on GPU and everything else on CPU, we could reduce the GPU usage to 2.2GB while having a non-so-big impact on performance. It should democratize Stable Diffusion even further.

Only other thing that would need to be done is move the tensors from the devices accordingly, but we can use the models device and dtype attributes to make everything work.

Describe the solution you'd like
I think what I'm proposing on #537 should be enough.

Describe alternatives you've considered
Alternative is to use GPUs for the whole process and pay more for it.

The text was updated successfully, but these errors were encountered:

anton-l · 2022-09-21T13:53:49Z

The linked PR is now closed:

The goal with pipelines is to provide simple. readable and easy to modify implementation that can be tweaked by users for their own use cases. Rather than supporting everything in the default pipeline we encourage users to take the code and tweak it the way they want. As there are various strategies to optimize the memory usage, it's best to let users choose what they want and tweak it themselves. So we are not in favor of handling this here.

But there might be a less intrusive way to keep the pipeline balanced between CPU and GPU (maybe using an accelerate wrapper), so I'm keen to keep this issue open for now :)

piEsposito · 2022-09-22T14:58:37Z

The linked PR is now closed:

The goal with pipelines is to provide simple. readable and easy to modify implementation that can be tweaked by users for their own use cases. Rather than supporting everything in the default pipeline we encourage users to take the code and tweak it the way they want. As there are various strategies to optimize the memory usage, it's best to let users choose what they want and tweak it themselves. So we are not in favor of handling this here.

But there might be a less intrusive way to keep the pipeline balanced between CPU and GPU (maybe using an accelerate wrapper), so I'm keen to keep this issue open for now :)

@anton-l what if we:

Created a context that will put the tensor on the model device before running it,
A method on Stable Diffusion and other pipelines to balance the models between cpu and GPU and then enable this context on the pipeline call method?

To make it with no intrusion? This could work for all the multiple pipelines with little code changes too.

Would something in that sense work? If yes, I can try opening a PR.

piEsposito · 2022-10-05T15:24:44Z

I've created a feature request on accelerate to enable solving this in a more elegant way. If they let me work on the feature, I can open a PR and then try solving this.

patrickvonplaten · 2022-10-07T13:45:46Z

Hey @piEsposito,

I'm wondering whether we could maybe try to just write a community pipeline for this: https://github.com/huggingface/diffusers/tree/main/examples/community

piEsposito · 2022-10-07T13:54:12Z

@patrickvonplaten I can do that and write the context for moving models between devices within the pipeline. That way we don't get blocked by the issue on accelerate.

How does that sound?

patrickvonplaten · 2022-10-07T17:06:43Z

Sounds great!

piEsposito · 2022-10-27T18:25:10Z

Closed by #850

piEsposito mentioned this issue Sep 16, 2022

stable diffusion using < 2.3GB of GPU memory #537

Closed

piEsposito mentioned this issue Oct 5, 2022

Add context to to move tensors within multiple submodel devices without intrusion huggingface/accelerate#743

Closed

piEsposito mentioned this issue Oct 14, 2022

minimal stable diffusion GPU memory usage with accelerate hooks #850

Merged

piEsposito closed this as completed Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Stable Diffusion memory usage by keeping unet only on GPU. #540

Reduce Stable Diffusion memory usage by keeping unet only on GPU. #540

piEsposito commented Sep 16, 2022 •

edited

Loading

anton-l commented Sep 21, 2022

piEsposito commented Sep 22, 2022 •

edited

Loading

piEsposito commented Oct 5, 2022

patrickvonplaten commented Oct 7, 2022

piEsposito commented Oct 7, 2022 •

edited

Loading

patrickvonplaten commented Oct 7, 2022

piEsposito commented Oct 27, 2022

Reduce Stable Diffusion memory usage by keeping unet only on GPU. #540

Reduce Stable Diffusion memory usage by keeping unet only on GPU. #540

Comments

piEsposito commented Sep 16, 2022 • edited Loading

anton-l commented Sep 21, 2022

piEsposito commented Sep 22, 2022 • edited Loading

piEsposito commented Oct 5, 2022

patrickvonplaten commented Oct 7, 2022

piEsposito commented Oct 7, 2022 • edited Loading

patrickvonplaten commented Oct 7, 2022

piEsposito commented Oct 27, 2022

piEsposito commented Sep 16, 2022 •

edited

Loading

piEsposito commented Sep 22, 2022 •

edited

Loading

piEsposito commented Oct 7, 2022 •

edited

Loading