Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS backend out of memory #9133

Open
1 task done
fangyinzhe opened this issue Mar 29, 2023 · 69 comments
Open
1 task done

MPS backend out of memory #9133

fangyinzhe opened this issue Mar 29, 2023 · 69 comments
Labels
bug-report Report of a bug, yet to be confirmed platform:mac Issues that apply to Apple OS X, M1, M2, etc

Comments

@fangyinzhe
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

MacOS,已经顺利进入http://127.0.0.1:7860/网站,但是生成图片出现这个错误
RuntimeError: MPS backend out of memory (MPS allocated: 5.05 GB, other allocations: 2.29 GB, max allowed: 6.77 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Steps to reproduce the problem

安装MPS

What should have happened?

RuntimeError: MPS backend out of memory (MPS allocated: 5.05 GB, other allocations: 2.29 GB, max allowed: 6.77 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Commit where the problem happens

A python: 3.10.10  •  torch: 1.12.1  •  xformers: N/A  •  gradio: 3.16.2  •  commit: 0cc0ee1  •  checkpoint: bf864f41d5

What platforms do you use to access the UI ?

MacOS

What browsers do you use to access the UI ?

Apple Safari

Command Line Arguments

NO

List of extensions

NO

Console logs

RuntimeError: MPS backend out of memory (MPS allocated: 5.05 GB, other allocations: 2.29 GB, max allowed: 6.77 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Additional information

No response

@fangyinzhe fangyinzhe added the bug-report Report of a bug, yet to be confirmed label Mar 29, 2023
@elisezhu123
Copy link

8gb version of mac is not enought to have mps accerlating and pytorch 2.0 or mps is only work in 13+ version

@fangyinzhe
Copy link
Author

inter i7 core
16G

@pudepiedj
Copy link

I have also experienced this runtime error while running the open-source version of Whisper on a 2019 Macbook built on an Intel i9 8-core CPU with 16GB RAM and an AMD Radeon Pro 5500M.

I had previously been running a decoder simulation that runs perfectly on Google Colab, which is when the error we've both experienced first appeared, but reducing batch sizes massively made no difference to the error, which then started appearing in Whisper runs on audio files of negligible size. So I concluded that it wasn't really a memory error at all, whatever the error message may say.

However, I extracted the Whisper code to another Jupyter Notebook and it ran perfectly on the GPU using the latest releases from Apple and PyTorch on Ventura macOS 13.3, with 13.0, as @elisezhu123 says , the minimum requirement. So the problem has "gone away" rather than being solved, but I'd suggest just rerunning your code in another clean notebook as a first step. The suggested "fix" with the environment variable is dangerous, and probably unnecessary, but if you do use it I'd try setting it to another value than 0.0; I think the default is 0.7, i.e. the GPU can use 70% memory, so maybe raise it a bit, but I really don't think memory is the problem; there's a "glitch" somewhere that changing notebooks fixes. Obviously very happy to be corrected on this if I am mistaken.

@fangyinzhe
Copy link
Author

So I can only switch to another computer, right?

@pudepiedj
Copy link

No - misunderstanding of "notebook". I meant that changing the code to another Jupyter (Anaconda3) notebook (not another physical Mac notebook) sorted the problem out for me, but since writing that it has come back again, so I am not sure that what I did solved it at all. There are some suggestions elsewhere that there may be an issue with MacOS Ventura 13.3 but I am not in a position to explore that.

@elisezhu123
Copy link

I have also experienced this runtime error while running the open-source version of Whisper on a 2019 Macbook built on an Intel i9 8-core CPU with 16GB RAM and an AMD Radeon Pro 5500M.

I had previously been running a decoder simulation that runs perfectly on Google Colab, which is when the error we've both experienced first appeared, but reducing batch sizes massively made no difference to the error, which then started appearing in Whisper runs on audio files of negligible size. So I concluded that it wasn't really a memory error at all, whatever the error message may say.

However, I extracted the Whisper code to another Jupyter Notebook and it ran perfectly on the GPU using the latest releases from Apple and PyTorch on Ventura macOS 13.3, with 13.0, as @elisezhu123 says , the minimum requirement. So the problem has "gone away" rather than being solved, but I'd suggest just rerunning your code in another clean notebook as a first step. The suggested "fix" with the environment variable is dangerous, and probably unnecessary, but if you do use it I'd try setting it to another value than 0.0; I think the default is 0.7, i.e. the GPU can use 70% memory, so maybe raise it a bit, but I really don't think memory is the problem; there's a "glitch" somewhere that changing notebooks fixes. Obviously very happy to be corrected on this if I am mistaken.

it is just the bug of 13.3… 13.2 works

@GrinZero
Copy link

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

@vanilladucky
Copy link

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best.

After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code
device = torch.device('mps')
and you can check by calling on device and if it gives you back 'mps', you are good to go.

Hope this helps.

@stephanebdc
Copy link

Same Problem here, any solution?
running
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch
for Salesforce/blip2-opt-2.7b
On macbook 2019
16GB ram
i9 and the Radeon

@honzajavorek
Copy link

honzajavorek commented May 11, 2023

I'm experiencing this with the latest commit of automatic and PyTorch v2 on my M1 8 GB running on macOS Ventura 13.3.1 (a).

Click to see the stack trace
Traceback (most recent call last):
  File "/Users/honza/Projects/stable-diffusion-webui/modules/call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "/Users/honza/Projects/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/img2img.py", line 181, in img2img
    processed = process_images(p)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "/Users/honza/Projects/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/processing.py", line 604, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/processing.py", line 1084, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "/Users/honza/Projects/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "/Users/honza/Projects/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in __call__
    return self.__sub_func(self.__orig_func, *args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/sd_hijack_unet.py", line 76, in <lambda>
    first_stage_sub = lambda orig_func, self, x, **kwargs: orig_func(self, x.to(devices.dtype_vae), **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py", line 526, in forward
    h = self.down[i_level].block[i_block](hs[-1], temb)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py", line 131, in forward
    h = self.norm1(h)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2530, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2956, in native_group_norm
    out, mean, rstd = _normalize(input_reshaped, reduction_dims, eps)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2914, in _normalize
    biased_var, mean = torch.var_mean(
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2419, in var_mean
    m = mean(a, dim, keepdim)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2373, in mean
    result = true_divide(result, nelem)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 220, in _fn
    result = fn(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 130, in _fn
    result = fn(**bound.arguments)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 926, in _ref
    return prim(a, b)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 1619, in true_divide
    return prims.div(a, b)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims/__init__.py", line 278, in _prim_impl
    meta(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims/__init__.py", line 400, in _elementwise_meta
    return TensorMeta(device=device, shape=shape, strides=strides, dtype=dtype)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims/__init__.py", line 256, in TensorMeta
    return torch.empty_strided(shape, strides, dtype=dtype, device=device)
RuntimeError: MPS backend out of memory (MPS allocated: 4.13 GB, other allocations: 5.24 GB, max allowed: 9.07 GB). Tried to allocate 512 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

While normal image generation works, this often occurs if I'm trying to use control net, but not always. Couldn't really figure out what's the differentiator. I have almost all other apps closed to leave maximum RAM unused.

What are my options to avoid this? I've noticed @brkirch is posting to discussions about Apple performance and has a fork at https://github.com/brkirch/stable-diffusion-webui/ with 14 commits ahead. Is this something that could speed up my poor performance or solve the "MPS backend out of memory" problem? Will it be ever merged to upstream? 🤔

@akamitoro
Copy link

I also keep having this issue if if scale the images on my M1 8Gb Mac Mini.

@akamitoro
Copy link

anyway to work around the issue? would the recommended solution from the error help? and how to do it?

Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit

@honzajavorek
Copy link

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

@rovo79
Copy link

rovo79 commented May 12, 2023

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best.

After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go.

Hope this helps.

Where do you put that line of code?
device = torch.device('mps')

@pudepiedj
Copy link

pudepiedj commented May 13, 2023 via email

@honzajavorek
Copy link

honzajavorek commented May 13, 2023 via email

@pudepiedj
Copy link

pudepiedj commented May 13, 2023 via email

@vanilladucky
Copy link

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best.
After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go.
Hope this helps.

Where do you put that line of code? device = torch.device('mps')

So the line of code device = torch.device('mps') is merely a line to initiate the device as mps instead of the normal cpu. If we don't run this line, PyTorch would just place its data and parameters on the cpu. So this line has be run anywhere in the code. However, be it on Jupyter notebooks or Python code, I recommend you to make sure it runs at the very top or somewhere where you import all your necessary libraries.

Without this line ran first, when you move your model and data to device, .to(device = device), those data won't be placed in the mps.

If you are new to PyTorch and the usage of mps on mac, I encourage you to read loading data onto the mps here. It is important to know how to load data and model parameters onto devices if you wish to run large models quickly. Without them, it would probably take you hours and even days to run just one epoch.

Hope this helps!

@dlebouc
Copy link

dlebouc commented May 13, 2023

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --no-half (without --precision full) works perfectly for me. Since I added PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 I didn't encountered the bug and the 4 performance cores of my MacBook M1 are much used than before

@BrjGit
Copy link

BrjGit commented May 13, 2023

Total noob here. Trying to utilize stable diffusion with deforum extension. Where exactly do I input the PYTORCH_MPS_HIGH_WATERMARK code into?

@dlebouc
Copy link

dlebouc commented May 13, 2023

Total noob here. Trying to utilize stable diffusion with deforum extension. Where exactly do I input the PYTORCH_MPS_HIGH_WATERMARK code into?

In terminal, type :
cd ~/stable-diffusion-webui; PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --no-half

@BrjGit
Copy link

BrjGit commented May 13, 2023

Total noob here. Trying to utilize stable diffusion with deforum extension. Where exactly do I input the PYTORCH_MPS_HIGH_WATERMARK code into?

In terminal, type : cd ~/stable-diffusion-webui; PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --no-half

Lifesaver. Thank you. It works now.

@akamitoro
Copy link

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

tyvm sir, this works but it is painfully long 2,3 hours to upscale 2x an image from 640x950 res. Is there anyway to speed this up? what setting to adjust highres.fix?

@pudepiedj
Copy link

pudepiedj commented May 14, 2023 via email

@pudepiedj
Copy link

pudepiedj commented May 14, 2023 via email

@honzajavorek
Copy link

@pudepiedj no problem!

@honzajavorek
Copy link

honzajavorek commented May 15, 2023

Regarding the settings, you can put the environment variable to your webui-user.sh as well. This is how my look like right now:

#!/bin/bash
#########################################################
# Uncomment and change the variables below to your need:#
#########################################################

# Install directory without trailing slash
#install_dir="/home/$(whoami)"

# Name of the subdirectory
#clone_dir="stable-diffusion-webui"

# PyTorch settings
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

# python3 executable
#python_cmd="python3"

... file continues unchanged ...

Then all you need to run your web UI is plain ./webui.sh, everything gets applied automatically.

@pudepiedj
Copy link

pudepiedj commented May 15, 2023 via email

@shamshhoda
Copy link

Hi, I guess you're also using stable diffusion with controlnet here. One easy way is to reduce your batch size. For eg. if you kept Batch size as 8, reduce to 4 or 5. or lastly just 1. It should work and would be faster.
Try this without defining PYTORCH ratio.

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

tyvm sir, this works but it is painfully long 2,3 hours to upscale 2x an image from 640x950 res. Is there anyway to speed this up? what setting to adjust highres.fix?

@Chase-Xuu
Copy link

Hello @pudepiedj , I copied the exact full arguments that are used and traced back once you run ./webui.sh - it prints out what are the actual arguments it was launched with so that you can verify that it really uses what you think you have set up.

there are several ways to set them up or overide them depending on your preferences. In my example for testing purposes I have export COMMANDLINE_ARGS="--skip-torch-cuda-test"in the file webuis-user.sh and I add the other arguments on launch like this ./webui.sh --upcast-sampling --no-half-vae --no-half --opt-split-attention-v1 --lowvram --use-cpu interrogate

Thank you! It works on my M2 Max device. It uses GPU instead of CPU.

@ohmygenie
Copy link

ohmygenie commented Jul 4, 2023

Hello, I have been trying to build a simple python GUI using tkinter for stable diffusion. I am always hitting the same issue since I'm using M1 mac. Here's my code, I tried adding the --skip-torch-cuda-test directly in my .py code but it's not working, please help.

Error: RuntimeError: MPS backend out of memory (MPS allocated: 16.46 GB, other allocations: 1.98 GB, max allowed: 18.13 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

import os
from diffusers import StableDiffusionPipeline

Set environment variables

os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.9"
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

Set command line arguments

os.environ["COMMANDLINE_ARGS"] = "--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

SDV5_MODEL_PATH = "/Users/user/stable-diffusion-v1-5/"
SAVE_PATH = os.path.join(os.environ['HOME'], "Desktop", "SDV5_OUTPUT")

if not os.path.exists(SAVE_PATH):
os.mkdir(SAVE_PATH)

def uniquify(path):
filename, extension = os.path.splitext(path)
counter = 1

while os.path.exists(path):
    path = filename + " (" + str(counter) + ")" + extension
    counter += 1

return path

prompt = "A dog rising in motorcycle"

print(f"Characters in prompt: {len(prompt)}, limit: 200")

pipe = StableDiffusionPipeline.from_pretrained(SDV5_MODEL_PATH)
pipe = pipe.to("mps")

output = pipe(prompt)

Use the images attribute to access the generated images

image = output.images[0] # Adjusted this line based on your findings

Save the image

image_path = uniquify(os.path.join(SAVE_PATH, (prompt[:25] + "...") if len(prompt) > 25 else prompt) + ".png")
print(image_path)

image.save(image_path)

@pudepiedj @branksypop @honzajavorek

@tmm1
Copy link

tmm1 commented Jul 4, 2023

the default values can be seen in the source code:

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.mm#L25-L35

  static const char* high_watermark_ratio_str = getenv("PYTORCH_MPS_HIGH_WATERMARK_RATIO");
  const double high_watermark_ratio =
      high_watermark_ratio_str ? strtod(high_watermark_ratio_str, nullptr) : default_high_watermark_ratio;
  setHighWatermarkRatio(high_watermark_ratio);

  const double default_low_watermark_ratio =
      m_device.hasUnifiedMemory ? default_low_watermark_ratio_unified : default_low_watermark_ratio_discrete;
  static const char* low_watermark_ratio_str = getenv("PYTORCH_MPS_LOW_WATERMARK_RATIO");
  const double low_watermark_ratio =
      low_watermark_ratio_str ? strtod(low_watermark_ratio_str, nullptr) : default_low_watermark_ratio;
  setLowWatermarkRatio(low_watermark_ratio);

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L299-L306

  // (see m_high_watermark_ratio for description)
  constexpr static double default_high_watermark_ratio = 1.7;
  // we set the allowed upper bound to twice the size of recommendedMaxWorkingSetSize.
  constexpr static double default_high_watermark_upper_bound = 2.0;
  // (see m_low_watermark_ratio for description)
  // on unified memory, we could allocate beyond the recommendedMaxWorkingSetSize
  constexpr static double default_low_watermark_ratio_unified  = 1.4;
  constexpr static double default_low_watermark_ratio_discrete = 1.0;

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L326-L332

  // high watermark ratio is a hard limit for the total allowed allocations
  // 0. : disables high watermark limit (may cause system failure if system-wide OOM occurs)
  // 1. : recommended maximum allocation size (i.e., device.recommendedMaxWorkingSetSize)
  // >1.: allows limits beyond the device.recommendedMaxWorkingSetSize
  // e.g., value 0.95 means we allocate up to 95% of recommended maximum
  // allocation size; beyond that, the allocations would fail with OOM error.
  double m_high_watermark_ratio;

@ohmygenie
Copy link

@pudepiedj @branksypop @honzajavorek

the default values can be seen in the source code:

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.mm#L25-L35

  static const char* high_watermark_ratio_str = getenv("PYTORCH_MPS_HIGH_WATERMARK_RATIO");
  const double high_watermark_ratio =
      high_watermark_ratio_str ? strtod(high_watermark_ratio_str, nullptr) : default_high_watermark_ratio;
  setHighWatermarkRatio(high_watermark_ratio);

  const double default_low_watermark_ratio =
      m_device.hasUnifiedMemory ? default_low_watermark_ratio_unified : default_low_watermark_ratio_discrete;
  static const char* low_watermark_ratio_str = getenv("PYTORCH_MPS_LOW_WATERMARK_RATIO");
  const double low_watermark_ratio =
      low_watermark_ratio_str ? strtod(low_watermark_ratio_str, nullptr) : default_low_watermark_ratio;
  setLowWatermarkRatio(low_watermark_ratio);

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L299-L306

  // (see m_high_watermark_ratio for description)
  constexpr static double default_high_watermark_ratio = 1.7;
  // we set the allowed upper bound to twice the size of recommendedMaxWorkingSetSize.
  constexpr static double default_high_watermark_upper_bound = 2.0;
  // (see m_low_watermark_ratio for description)
  // on unified memory, we could allocate beyond the recommendedMaxWorkingSetSize
  constexpr static double default_low_watermark_ratio_unified  = 1.4;
  constexpr static double default_low_watermark_ratio_discrete = 1.0;

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L326-L332

  // high watermark ratio is a hard limit for the total allowed allocations
  // 0. : disables high watermark limit (may cause system failure if system-wide OOM occurs)
  // 1. : recommended maximum allocation size (i.e., device.recommendedMaxWorkingSetSize)
  // >1.: allows limits beyond the device.recommendedMaxWorkingSetSize
  // e.g., value 0.95 means we allocate up to 95% of recommended maximum
  // allocation size; beyond that, the allocations would fail with OOM error.
  double m_high_watermark_ratio;

Thanks, apparently, my torch installation at M1 was having a problem. I've reinstalled it and it's now working. Now, I received a new error:

NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

--> Essentially here's what's happening for Apple silicon user: Option #1: GPU (not possible), Option #2: CPU (I tried it, takes 30 minutes to generate 1 picture), Option #3: MPS -> But I have this new error above. Option #4: Try to use AUTOMATIC1111 which impressively generates 1 picture for only 20 seconds; however, it's not customisable, say if you want to build something like that as a project for a client.

So yeah, it's the painful situation for Apple silicon users wanting to build an AI program using SD from scratch.

@BewhY08
Copy link

BewhY08 commented Jul 7, 2023

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

关于设置,您webui-user.sh也可以将环境变量添加到您的环境变量中。这就是我现在的样子:

#!/bin/bash
#########################################################
# Uncomment and change the variables below to your need:#
#########################################################

# Install directory without trailing slash
#install_dir="/home/$(whoami)"

# Name of the subdirectory
#clone_dir="stable-diffusion-webui"

# PyTorch settings
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

# python3 executable
#python_cmd="python3"

... file continues unchanged ...

然后,运行 Web UI 所需的一切都很简单./webui.sh,一切都会自动应用。

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

@ohmygenie
Copy link

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

关于设置,您webui-user.sh也可以将环境变量添加到您的环境变量中。这就是我现在的样子:

#!/bin/bash
#########################################################
# Uncomment and change the variables below to your need:#
#########################################################

# Install directory without trailing slash
#install_dir="/home/$(whoami)"

# Name of the subdirectory
#clone_dir="stable-diffusion-webui"

# PyTorch settings
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

# python3 executable
#python_cmd="python3"

... file continues unchanged ...

然后,运行 Web UI 所需的一切都很简单./webui.sh,一切都会自动应用。

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

Thanks, I presume this answer is for AUTOMATIC1111 users, correct? This won't be applicable for those who are building their customised program using stable diffusion, from scratch, as all of the dependencies will need to be done. Editing webui.sh is not applicable for this scenario.

Looking forward from someone who was able to run stable diffusion successfully in their Apple silicon machines using MPS (not CPU) with their own customised program.

@luluaidota
Copy link

I run this in 13.4.1 but also have the same problem

@thedoger82
Copy link

For me the problem was the canvas size (1280x720), so i used something smaller (640x320) and i got no more mps problems, in case you need higher resolutions, create your images/videos with small resolutions and then use Topaz another AI which will do the job of increasing size and quality

@efeLongoria
Copy link

efeLongoria commented Jul 19, 2023

Hello my error is basically the same "RuntimeError: MPS backend out of memory" I tried several of the methods mentioned here and unfortunately I had no success, to be very specific I could not use the "Hires. fix" option, the process was always interrupted by this error, so I could not make images of format greater than 768x768.

Today in the morning with the help of ChatGPT4 I could solve the bug and I leave how I could solve it, it comes in a very synthesized way I hope it is useful.

install miniconda (if you already have it, skip this step)

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh

Create a new virtual environment with Conda specifying Python 3.9. Copy and paste the following command into the terminal:

conda create --name "your name" python=3.9

Activate the virtual environment. Copy and paste the following command into the terminal:

conda activate "your name previously"

Install PyTorch in the virtual environment. Copy and paste the following command into the terminal:

conda install pytorch torchvision torchaudio -c pytorch-nightly

This step is used to check if MPS (Metal Performance Shaders) is available and, if so, it creates a tensor on the MPS device and prints it. It is a way to validate that everything is working correctly.
Create a Python file (for example, mps_test.py) with the following code to test the MPS device. You can do this using any text editor, then save the file with the .py extension:

import torch
if torch.backends.mps.is_available():
    device = torch.device('mps')
    x = torch.ones(1, device=device)
    print(x)
else:
    print("MPS device not found.")

Run the Python file. Copy and paste the following command into the terminal:

python mps_test.py

tensor([1.], device='mps:0') This has to be your result in order to work smoothly. If you are experiencing memory problems with the MPS backend, you can adjust the proportion of memory PyTorch is allowed to use.
0.0: Disables the upper limit for memory allocations. This means that PyTorch will try to use as much GPU memory as necessary.
Values between 0 and 1: These values represent the fraction of the total GPU memory that PyTorch is allowed to use. For example, if the value is 0.5 on a GPU with 8GB of memory, PyTorch will try to use no more than 4GB.
Values greater than 1: are meaningless in this context and will probably cause unwanted behavior or errors.

Finally copy and paste the following command into the terminal:

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

At this point, the only thing left to do is to start the UI ./webui.sh

@ohmygenie
Copy link

Hello, same steps I did but using Anaconda. The best time I have in CPU is 6 minutes (for a single image) with an entry level MBP M1 Pro (2021).

Are you able to successfully generate an image from your customised program (not AUTOMATIC1111) without encountering the error? If yes, feel free to share the code or tweaks you made.

Essentially, the underlying issue is you can use AUTOMATIC1111 and generate all the images you want with MPS because it has made a lot of changes in the backend with embeddings, etc. so no issues on that.

Problem starts if you create your own python program (not AUTOMATIC1111) with stable diffusion and generate an image, it will always prompt that error about MPS. Workaround is changing it to CPU, or resort to using a device with a GPU/CUDA like a windows laptop or PC.

@ohmygenie
Copy link

ohmygenie commented Jul 19, 2023

Hello my error is basically the same "RuntimeError: MPS backend out of memory" I tried several of the methods mentioned here and unfortunately I had no success, to be very specific I could not use the "Hires. fix" option, the process was always interrupted by this error, so I could not make images of format greater than 768x768.

Today in the morning with the help of ChatGPT4 I could solve the bug and I leave how I could solve it, it comes in a very synthesized way I hope it is useful.

install miniconda (if you already have it, skip this step)

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh

Create a new virtual environment with Conda specifying Python 3.9. Copy and paste the following command into the terminal:

conda create --name "your name" python=3.9

Activate the virtual environment. Copy and paste the following command into the terminal:

conda activate "your name previously"

Install PyTorch in the virtual environment. Copy and paste the following command into the terminal:

conda install pytorch torchvision torchaudio -c pytorch-nightly

This step is used to check if MPS (Metal Performance Shaders) is available and, if so, it creates a tensor on the MPS device and prints it. It is a way to validate that everything is working correctly. Create a Python file (for example, mps_test.py) with the following code to test the MPS device. You can do this using any text editor, then save the file with the .py extension:

import torch
if torch.backends.mps.is_available():
    device = torch.device('mps')
    x = torch.ones(1, device=device)
    print(x)
else:
    print("MPS device not found.")

Run the Python file. Copy and paste the following command into the terminal:

python mps_test.py



If you are experiencing memory problems with the MPS backend, you can adjust the ratio of memory that PyTorch is allowed to use. To prevent PyTorch from using memory beyond the capacity of the GPU, you can set the ratio to 0.0. Copy and paste the following command into the terminal:

tensor([1.], device='mps:0') <--- that's the result from my machine which means MPS is activated.

Looking forward if someone would like to share their code if they are able to successfully generate an image with MPS as device in an apple silicon machine.

@efeLongoria
Copy link

Hello my error is basically the same "RuntimeError: MPS backend out of memory" I tried several of the methods mentioned here and unfortunately I had no success, to be very specific I could not use the "Hires. fix" option, the process was always interrupted by this error, so I could not make images of format greater than 768x768.
Today in the morning with the help of ChatGPT4 I could solve the bug and I leave how I could solve it, it comes in a very synthesized way I hope it is useful.
install miniconda (if you already have it, skip this step)

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh

Create a new virtual environment with Conda specifying Python 3.9. Copy and paste the following command into the terminal:

conda create --name "your name" python=3.9

Activate the virtual environment. Copy and paste the following command into the terminal:

conda activate "your name previously"

Install PyTorch in the virtual environment. Copy and paste the following command into the terminal:

conda install pytorch torchvision torchaudio -c pytorch-nightly

This step is used to check if MPS (Metal Performance Shaders) is available and, if so, it creates a tensor on the MPS device and prints it. It is a way to validate that everything is working correctly. Create a Python file (for example, mps_test.py) with the following code to test the MPS device. You can do this using any text editor, then save the file with the .py extension:

import torch
if torch.backends.mps.is_available():
    device = torch.device('mps')
    x = torch.ones(1, device=device)
    print(x)
else:
    print("MPS device not found.")

Run the Python file. Copy and paste the following command into the terminal:

python mps_test.py



If you are experiencing memory problems with the MPS backend, you can adjust the ratio of memory that PyTorch is allowed to use. To prevent PyTorch from using memory beyond the capacity of the GPU, you can set the ratio to 0.0. Copy and paste the following command into the terminal:

tensor([1.], device='mps:0') <--- that's the result from my machine which means MPS is activated.

Looking forward if someone would like to share their code if they are able to successfully generate an image with MPS as device in an apple silicon machine.

I'm sorry it was not helpful, after several days this worked for me, I will try to test more variables to see if I can find another alternative.

@ealkanat
Copy link

ealkanat commented Aug 1, 2023

I run this in 13.5, same problem.

2.3 GHz 8-Core Intel Core i9
AMD Radeon Pro 5500M 4 GB

@efeLongoria
Copy link

I run this in 13.5, same problem.

2.3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 4 GB

you have already tried this? "https://developer.apple.com/metal/pytorch/".

and this?
COMMANDLINE_ARGS="--lowvram --opt-split-attention"

@fxbeaulieu
Copy link

add PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 in the command you use to start WebUI.
Example :
cd '/Users/fxbeaulieu/stable-diffusion-webui';PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./webui.sh --autolaunch;exit

@ealkanat
Copy link

ealkanat commented Aug 2, 2023

I run this in 13.5, same problem.

2.3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 4 GB

add PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 in the command you use to start WebUI. Example : cd '/Users/fxbeaulieu/stable-diffusion-webui';PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./webui.sh --autolaunch;exit

Sorry guys this was my mistake. I found different torch versions on my machine.

Deleted all venv folders under base directory (.venv, venv).
Then I installed Torch nightly version again.
It's fixed!

image

dilwong added a commit to dilwong/stable-diffusion-webui that referenced this issue Aug 9, 2023
Based on comment from AUTOMATIC1111#9133 (comment)

Using GPU is slower for some reason and lags my computer
@injelee21
Copy link

injelee21 commented Oct 9, 2023

Thank you. @efeLongoria I was able to produce same out put tensor([1.], device='mps:0' however I am still encountering the same issue. MPS backend out of memory (MPS allocated: 6.50 GB, other allocations: 29.72 GB, max allowed: 36.27 GB). Tried to allocate 128.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure). I have been reading all the comments and some people did fix it by PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half. What is ./webui.sh? where should I download certain file? By the way I am using M2 with 32GB

@SamKhoze
Copy link

SamKhoze commented Dec 4, 2023

I had the same problem with Comfyui running vid2vid and received this error:
RuntimeError: MPS backend out of memory (MPS allocated: 10.74 GB, other allocations: 23.29 GB, max allowed: 36.27 GB). Tried to allocate 2.25 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

I fixed it by rebooting comfyui via this command:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python main.py

@raffetazarius
Copy link

raffetazarius commented Jan 1, 2024

I have a hunch that GPU VRAM may not be getting flushed correctly by A1111 after generations when running on MacOS installations leveraging PyTorch and MPS, since I'm seeing VRAM usage increase after each consecutive image generation (Intel Mac Pro with AMD GPU) until between gen 5 and 10 I get the "MPS Backend out of memory" error, forcing me to restart SD Web UI to complete more generations.

To any engineer looking to fix this in the A1111 codebase, this article may be useful:

https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530

particularly this comment https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27


also https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265?u=cedric

@raffetazarius
Copy link

With @viking1304's help, I've tested a1111 with PyTorch 2.3.0.dev20240103 today on my aforementioned Mac Pro 2019 Intel + AMD 6900XT GPU rig and am no longer getting this MPS Out of Memory error! Yay!

Installed latest PyTorch dev version using viking1304's A1111 installer - https://github.com/viking1304/a1111-setup

@mykolaienko21
Copy link

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

I have this problem in terminal after this command. help me to solve it(

(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
-bash: ./webui.sh: No such file or directory

@satvik-1945
Copy link

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

I have this problem in terminal after this command. help me to solve it(

(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory

I am also facing the same problem, where to put these lines of code

@efeLongoria
Copy link

efeLongoria commented Jun 1, 2024

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

I have this problem in terminal after this command. help me to solve it(
(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory

I am also facing the same problem, where to put these lines of code

In the terminal, you need RUN the SD whit that command

@SudhanshuBlaze
Copy link

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

I have this problem in terminal after this command. help me to solve it(
(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory

I am also facing the same problem, where to put these lines of code

In the terminal, you need RUN the SD whit that command

What do you mean by SD?

@thedoger82
Copy link

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

I have this problem in terminal after this command. help me to solve it(
(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory

I am also facing the same problem, where to put these lines of code

In the terminal, you need RUN the SD whit that command

What do you mean by SD?

Stable Diffusion

@wasimsafdar
Copy link

I am facing a similar issue for llama32. I am using "Llama-3.2-3B-Instruct" in Pytorch. I have Mac M1 Pro, 16 GB.

@MozzieD
Copy link

MozzieD commented Jan 19, 2025

Try using Chrome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed platform:mac Issues that apply to Apple OS X, M1, M2, etc
Projects
None yet
Development

No branches or pull requests