-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got OOM message with GTX3060 #101
Comments
We are likely being more inefficient than TensorFlow somewhere. This might be related: elixir-nx/nx#1003 One thing you can try is mixed precision in all of the models: policy = Axon.MixedPrecision.create_policy(compute: :f16)
# do this for every model
{:ok, %{model: clip_model} = clip} = Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"})
clip = %{clip | model: Axon.MixedPrecision.apply_policy(clip, policy)} Note I haven't tested if this would affect image outputs or not |
I tried code like this. policy = Axon.MixedPrecision.create_policy(compute: :f16)
{:ok, clip} =
Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"},
log_params_diff: false
)
clip = %{clip | model: Axon.MixedPrecision.apply_policy(clip.model, policy)}
{:ok, unet} =
Bumblebee.load_model({:hf, repository_id, subdir: "unet"},
params_filename: "diffusion_pytorch_model.bin",
log_params_diff: false
)
unet = %{unet | model: Axon.MixedPrecision.apply_policy(unet.model, policy)}
{:ok, vae} =
Bumblebee.load_model({:hf, repository_id, subdir: "vae"},
architecture: :decoder,
params_filename: "diffusion_pytorch_model.bin",
log_params_diff: false
)
vae = %{vae | model: Axon.MixedPrecision.apply_policy(vae.model, policy)}
{:ok, safety_checker} =
Bumblebee.load_model({:hf, repository_id, subdir: "safety_checker"},
log_params_diff: false
)
safety_checker = %{safety_checker | model: Axon.MixedPrecision.apply_policy(safety_checker.model, policy)} |
I see this as well, which is probably expected in that I have only 6 GB. I will note that I can run things like InvokeAI and do text2img with only 6 GB (and I believe InvokeAI is using the same type of lowered precision to achieve that). My specs:
|
I set the following Policy and confirmed that the image can be generated for the host client(not cuda). policy =
Axon.MixedPrecision.create_policy(
params: {:f, 16},
compute: {:f, 32},
output: {:f, 16}
)
clip = %{clip | model: Axon.MixedPrecision.apply_policy(clip.model, policy)}
unet = %{unet | model: Axon.MixedPrecision.apply_policy(unet.model, policy)}
vae = %{vae | model: Axon.MixedPrecision.apply_policy(vae.model, policy)}
safety_checker = %{
safety_checker
| model: Axon.MixedPrecision.apply_policy(safety_checker.model, policy)
}
serving =
Bumblebee.Diffusion.StableDiffusion.text_to_image(clip, unet, vae, tokenizer, scheduler,
num_steps: 10,
num_images_per_prompt: 1,
safety_checker: safety_checker,
safety_checker_featurizer: featurizer,
compile: [batch_size: 1, sequence_length: 50],
defn_options: [compiler: EXLA]
) OOM occurs when running in cuda. Looking at the Peak buffers included in the OOM message, the Shape is f32. Is there no policy effect, or is it a memory problem unrelated to the policy?
|
Yes, it can also be that there are places where we could improve the model efficiency. There are some PRs in the diffusers repo and some Twitter threads:
@seanmor5, do you know what we need to do to generate graphs such as this one? huggingface/diffusers#371 |
Forwarded here from the above issue. Is there anyway for me to give bumblebee more of my memory? Do I need to simply increase the amount of memory I have? |
You have 4GB right? That’s currently not enough for SD. |
No the VM I run this on has 8GB and the GPU I have has 6GB. |
@krainboltgreene we have some experiments that have brought it down to 5GB for a single image. We will be publishing them in the coming weeks. |
That is incredible. I have been wanting to dive much deeper into how bumblebee/nx work because I would love to contribute even more to the various APIs. Excited to see the source and learn more. |
Opened #147 with a more principled approach. |
I've been trying to Stable Diffusion with GPU.
But it failed and I got the OOM message
Is this error message due to insufficient GPU memory?
Is it possible to make it work by adjusting some parameters?
Stable Diffusion 1.4 is running on this GPU in the tensorflow environment. It would be nice if it works with bumblebee too.
it's working fine with :host . It's amazing how easy it is to use neural networks with livebooks!!!
OS Ubunt 22.04 on WSL2
GPU GTX3060(12GB)
Livebook v0.8.0
Elixir v1.14.2
XLA_TARGET=cuda111
CUDA Version: 11.7
whole log is
oommessage.log
The text was updated successfully, but these errors were encountered: