MPS backend out of memory #9133

fangyinzhe · 2023-03-29T06:55:02Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What happened?

MacOS，已经顺利进入http://127.0.0.1:7860/网站，但是生成图片出现这个错误
RuntimeError: MPS backend out of memory (MPS allocated: 5.05 GB, other allocations: 2.29 GB, max allowed: 6.77 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Steps to reproduce the problem

安装MPS

What should have happened?

RuntimeError: MPS backend out of memory (MPS allocated: 5.05 GB, other allocations: 2.29 GB, max allowed: 6.77 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Commit where the problem happens

A python: 3.10.10 • torch: 1.12.1 • xformers: N/A • gradio: 3.16.2 • commit: 0cc0ee1 • checkpoint: bf864f41d5

What platforms do you use to access the UI ?

MacOS

What browsers do you use to access the UI ?

Apple Safari

Command Line Arguments

NO

List of extensions

NO

Console logs

RuntimeError: MPS backend out of memory (MPS allocated: 5.05 GB, other allocations: 2.29 GB, max allowed: 6.77 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)

Additional information

No response

The text was updated successfully, but these errors were encountered:

elisezhu123 · 2023-03-29T07:00:32Z

8gb version of mac is not enought to have mps accerlating and pytorch 2.0 or mps is only work in 13+ version

fangyinzhe · 2023-03-29T07:44:12Z

inter i7 core
16G

pudepiedj · 2023-04-07T10:14:25Z

I have also experienced this runtime error while running the open-source version of Whisper on a 2019 Macbook built on an Intel i9 8-core CPU with 16GB RAM and an AMD Radeon Pro 5500M.

I had previously been running a decoder simulation that runs perfectly on Google Colab, which is when the error we've both experienced first appeared, but reducing batch sizes massively made no difference to the error, which then started appearing in Whisper runs on audio files of negligible size. So I concluded that it wasn't really a memory error at all, whatever the error message may say.

However, I extracted the Whisper code to another Jupyter Notebook and it ran perfectly on the GPU using the latest releases from Apple and PyTorch on Ventura macOS 13.3, with 13.0, as @elisezhu123 says , the minimum requirement. So the problem has "gone away" rather than being solved, but I'd suggest just rerunning your code in another clean notebook as a first step. The suggested "fix" with the environment variable is dangerous, and probably unnecessary, but if you do use it I'd try setting it to another value than 0.0; I think the default is 0.7, i.e. the GPU can use 70% memory, so maybe raise it a bit, but I really don't think memory is the problem; there's a "glitch" somewhere that changing notebooks fixes. Obviously very happy to be corrected on this if I am mistaken.

fangyinzhe · 2023-04-08T06:48:39Z

So I can only switch to another computer, right？

pudepiedj · 2023-04-09T11:22:37Z

No - misunderstanding of "notebook". I meant that changing the code to another Jupyter (Anaconda3) notebook (not another physical Mac notebook) sorted the problem out for me, but since writing that it has come back again, so I am not sure that what I did solved it at all. There are some suggestions elsewhere that there may be an issue with MacOS Ventura 13.3 but I am not in a position to explore that.

elisezhu123 · 2023-04-18T20:26:55Z

I have also experienced this runtime error while running the open-source version of Whisper on a 2019 Macbook built on an Intel i9 8-core CPU with 16GB RAM and an AMD Radeon Pro 5500M.

I had previously been running a decoder simulation that runs perfectly on Google Colab, which is when the error we've both experienced first appeared, but reducing batch sizes massively made no difference to the error, which then started appearing in Whisper runs on audio files of negligible size. So I concluded that it wasn't really a memory error at all, whatever the error message may say.

However, I extracted the Whisper code to another Jupyter Notebook and it ran perfectly on the GPU using the latest releases from Apple and PyTorch on Ventura macOS 13.3, with 13.0, as @elisezhu123 says , the minimum requirement. So the problem has "gone away" rather than being solved, but I'd suggest just rerunning your code in another clean notebook as a first step. The suggested "fix" with the environment variable is dangerous, and probably unnecessary, but if you do use it I'd try setting it to another value than 0.0; I think the default is 0.7, i.e. the GPU can use 70% memory, so maybe raise it a bit, but I really don't think memory is the problem; there's a "glitch" somewhere that changing notebooks fixes. Obviously very happy to be corrected on this if I am mistaken.

it is just the bug of 13.3… 13.2 works

GrinZero · 2023-04-29T13:01:42Z

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

vanilladucky · 2023-05-03T06:01:27Z

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best.

After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code
device = torch.device('mps')
and you can check by calling on device and if it gives you back 'mps', you are good to go.

Hope this helps.

stephanebdc · 2023-05-07T12:39:03Z

Same Problem here, any solution?
running
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch
for Salesforce/blip2-opt-2.7b
On macbook 2019
16GB ram
i9 and the Radeon

honzajavorek · 2023-05-11T18:48:49Z

I'm experiencing this with the latest commit of automatic and PyTorch v2 on my M1 8 GB running on macOS Ventura 13.3.1 (a).

Click to see the stack trace

Traceback (most recent call last):
  File "/Users/honza/Projects/stable-diffusion-webui/modules/call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "/Users/honza/Projects/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/img2img.py", line 181, in img2img
    processed = process_images(p)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "/Users/honza/Projects/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/processing.py", line 604, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/processing.py", line 1084, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "/Users/honza/Projects/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "/Users/honza/Projects/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in __call__
    return self.__sub_func(self.__orig_func, *args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/modules/sd_hijack_unet.py", line 76, in <lambda>
    first_stage_sub = lambda orig_func, self, x, **kwargs: orig_func(self, x.to(devices.dtype_vae), **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py", line 526, in forward
    h = self.down[i_level].block[i_block](hs[-1], temb)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py", line 131, in forward
    h = self.norm1(h)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2530, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2956, in native_group_norm
    out, mean, rstd = _normalize(input_reshaped, reduction_dims, eps)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2914, in _normalize
    biased_var, mean = torch.var_mean(
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2419, in var_mean
    m = mean(a, dim, keepdim)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 2373, in mean
    result = true_divide(result, nelem)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 220, in _fn
    result = fn(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 130, in _fn
    result = fn(**bound.arguments)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 926, in _ref
    return prim(a, b)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_refs/__init__.py", line 1619, in true_divide
    return prims.div(a, b)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims/__init__.py", line 278, in _prim_impl
    meta(*args, **kwargs)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims/__init__.py", line 400, in _elementwise_meta
    return TensorMeta(device=device, shape=shape, strides=strides, dtype=dtype)
  File "/Users/honza/Projects/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_prims/__init__.py", line 256, in TensorMeta
    return torch.empty_strided(shape, strides, dtype=dtype, device=device)
RuntimeError: MPS backend out of memory (MPS allocated: 4.13 GB, other allocations: 5.24 GB, max allowed: 9.07 GB). Tried to allocate 512 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

While normal image generation works, this often occurs if I'm trying to use control net, but not always. Couldn't really figure out what's the differentiator. I have almost all other apps closed to leave maximum RAM unused.

What are my options to avoid this? I've noticed @brkirch is posting to discussions about Apple performance and has a fork at https://github.com/brkirch/stable-diffusion-webui/ with 14 commits ahead. Is this something that could speed up my poor performance or solve the "MPS backend out of memory" problem? Will it be ever merged to upstream? 🤔

akamitoro · 2023-05-12T17:03:34Z

I also keep having this issue if if scale the images on my M1 8Gb Mac Mini.

akamitoro · 2023-05-12T17:06:13Z

anyway to work around the issue? would the recommended solution from the error help? and how to do it?

Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit

honzajavorek · 2023-05-12T17:19:25Z

This seems to help, at least in my case:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

rovo79 · 2023-05-12T20:39:08Z

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best.

After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go.

Hope this helps.

Where do you put that line of code?
device = torch.device('mps')

pudepiedj · 2023-05-13T06:59:28Z

I recommend reading the very good documentation on the PyTorch website which has examples showing how to use the MPS device and how to load data onto it.https://pytorch.org/docs/stable/notes/mps.htmlSent from my iPhoneOn 12 May 2023, at 21:39, Robert Dean ***@***.***> wrote: Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this. On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best. After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go. Hope this helps. Where do you put that line of code? device = torch.device('mps') —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

honzajavorek · 2023-05-13T07:16:13Z

I think the latest automatic release with pytorch2 already does this for you?

…

On Sat 13. 5. 2023 at 8:59, pudepiedj ***@***.***> wrote: I recommend reading the very good documentation on the PyTorch website which has examples showing how to use the MPS device and how to load data onto it.https://pytorch.org/docs/stable/notes/mps.htmlSent from my iPhoneOn 12 May 2023, at 21:39, Robert Dean ***@***.***> wrote: Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this. On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best. After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go. Hope this helps. Where do you put that line of code? device = torch.device('mps') —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#9133 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACFGMM65ZJ63V6EYDELOWTXF4WNXANCNFSM6AAAAAAWLQ2XOU> . You are receiving this because you commented.Message ID: ***@***.***>

pudepiedj · 2023-05-13T08:32:26Z

I am not sure what you mean. PyTorch2 has MPS support through torch.mps and the PyTorch-nightly now at 2.1.0v20330512 also has it, but unless I have missed something the mps device must still be deliberately invoked because some hardware systems don’t have it. Please let me know if I am mistaken!Sent from my iPhoneOn 13 May 2023, at 08:16, Honza Javorek ***@***.***> wrote: I think the latest automatic release with pytorch2 already does this for you?

On Sat 13. 5. 2023 at 8:59, pudepiedj ***@***.***> wrote: I recommend reading the very good documentation on the PyTorch website which has examples showing how to use the MPS device and how to load data onto it.https://pytorch.org/docs/stable/notes/mps.htmlSent from my iPhoneOn 12 May 2023, at 21:39, Robert Dean ***@***.***> wrote: Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this. On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best. After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go. Hope this helps. Where do you put that line of code? device = torch.device('mps') —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#9133 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACFGMM65ZJ63V6EYDELOWTXF4WNXANCNFSM6AAAAAAWLQ2XOU> . You are receiving this because you commented.Message ID: ***@***.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

vanilladucky · 2023-05-13T08:41:03Z

Excuse me, could you please tell me how to activate the MPS mode. I don't quite understand this.

On Mac, cuda doesn't work as it doesn't have a dedicated nvidia GPU. So we would have to download a specific version of PyTorch to utilize the Metal Performance Shaders (MPS) backend. This webpage on Apple explains it best.
After installing the specific version of PyTorch, you should be able to simply call the MPS backend. Personally, I utilize this line of code device = torch.device('mps') and you can check by calling on device and if it gives you back 'mps', you are good to go.
Hope this helps.

Where do you put that line of code? device = torch.device('mps')

So the line of code device = torch.device('mps') is merely a line to initiate the device as mps instead of the normal cpu. If we don't run this line, PyTorch would just place its data and parameters on the cpu. So this line has be run anywhere in the code. However, be it on Jupyter notebooks or Python code, I recommend you to make sure it runs at the very top or somewhere where you import all your necessary libraries.

Without this line ran first, when you move your model and data to device, .to(device = device), those data won't be placed in the mps.

If you are new to PyTorch and the usage of mps on mac, I encourage you to read loading data onto the mps here. It is important to know how to load data and model parameters onto devices if you wish to run large models quickly. Without them, it would probably take you hours and even days to run just one epoch.

Hope this helps!

honzajavorek · 2023-05-13T11:22:53Z

What about this?

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/b08500cec8a791ef20082628b49b17df833f5dda/modules/devices.py#LL38C21

dlebouc · 2023-05-13T15:06:07Z

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --no-half (without --precision full) works perfectly for me. Since I added PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 I didn't encountered the bug and the 4 performance cores of my MacBook M1 are much used than before

BrjGit · 2023-05-13T19:16:11Z

Total noob here. Trying to utilize stable diffusion with deforum extension. Where exactly do I input the PYTORCH_MPS_HIGH_WATERMARK code into?

dlebouc · 2023-05-13T19:19:53Z

Total noob here. Trying to utilize stable diffusion with deforum extension. Where exactly do I input the PYTORCH_MPS_HIGH_WATERMARK code into?

In terminal, type :
cd ~/stable-diffusion-webui; PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --no-half

BrjGit · 2023-05-13T19:41:59Z

Total noob here. Trying to utilize stable diffusion with deforum extension. Where exactly do I input the PYTORCH_MPS_HIGH_WATERMARK code into?

In terminal, type : cd ~/stable-diffusion-webui; PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --no-half

Lifesaver. Thank you. It works now.

akamitoro · 2023-05-13T21:45:18Z

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

tyvm sir, this works but it is painfully long 2,3 hours to upscale 2x an image from 640x950 res. Is there anyway to speed this up? what setting to adjust highres.fix?

pudepiedj · 2023-05-14T07:29:05Z

Have you tried all the Apple optimisation suggestions at https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon? In the last paragraph there are specific suggestions about timing and how to improve it.

…

On Sat, May 13, 2023 at 10:45 PM akamitoro ***@***.***> wrote: This seems to help, at least in my case: PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half tyvm sir, this works but it is painfully long 2,3 hours to upscale 2x an image from 640x950 res. Is there anyway to speed this up? what setting to adjust highres.fix? — Reply to this email directly, view it on GitHub <#9133 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGG22YOVVNI73D4733H33RLXF76HTANCNFSM6AAAAAAWLQ2XOU> . You are receiving this because you commented.Message ID: ***@***.***>

pudepiedj · 2023-05-14T07:30:40Z

I see what you mean. I was misunderstanding you to be suggesting that PyTorch2 automatically selects the mps device, which I don't think it does. Sorry for the confusion!

…

On Sat, May 13, 2023 at 12:23 PM Honza Javorek ***@***.***> wrote: What about this? https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/b08500cec8a791ef20082628b49b17df833f5dda/modules/devices.py#LL38C21 — Reply to this email directly, view it on GitHub <#9133 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGG22YLNYDNAW6V3WZLAYCTXF5VJPANCNFSM6AAAAAAWLQ2XOU> . You are receiving this because you commented.Message ID: ***@***.***>

honzajavorek · 2023-05-15T08:16:56Z

@pudepiedj no problem!

honzajavorek · 2023-05-15T08:17:02Z

Regarding the settings, you can put the environment variable to your webui-user.sh as well. This is how my look like right now:

#!/bin/bash
#########################################################
# Uncomment and change the variables below to your need:#
#########################################################

# Install directory without trailing slash
#install_dir="/home/$(whoami)"

# Name of the subdirectory
#clone_dir="stable-diffusion-webui"

# PyTorch settings
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

# python3 executable
#python_cmd="python3"

... file continues unchanged ...

Then all you need to run your web UI is plain ./webui.sh, everything gets applied automatically.

pudepiedj · 2023-05-15T09:37:21Z

Does this in fact implement and use the MPS device? I've been investigating over the weekend using the Activity Monitor "GPU History" display and I don't think my GPU is being used at all; stable-diffusion is just running on the CPU. This of course may explain why I am not getting the "MPS Backend Out of Memory" error, too! :)

…

On Mon, May 15, 2023 at 9:17 AM Honza Javorek ***@***.***> wrote: Regarding the settings, you can put the environment variable to your webui-user.sh as well. This is how my look like right now: #!/bin/bash########################################################## Uncomment and change the variables below to your need:########################################################## # Install directory without trailing slash#install_dir="/home/$(whoami)" # Name of the subdirectory#clone_dir="stable-diffusion-webui" # PyTorch settingsexport PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7export PYTORCH_ENABLE_MPS_FALLBACK=1 # Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate" # python3 executable#python_cmd="python3" ... file continues unchanged ... — Reply to this email directly, view it on GitHub <#9133 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGG22YK442MUJIMJ54CVC3LXGHRATANCNFSM6AAAAAAWLQ2XOU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shamshhoda · 2023-06-19T11:16:00Z

Hi, I guess you're also using stable diffusion with controlnet here. One easy way is to reduce your batch size. For eg. if you kept Batch size as 8, reduce to 4 or 5. or lastly just 1. It should work and would be faster.
Try this without defining PYTORCH ratio.

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
tyvm sir, this works but it is painfully long 2,3 hours to upscale 2x an image from 640x950 res. Is there anyway to speed this up? what setting to adjust highres.fix?

Chase-Xuu · 2023-07-03T09:40:50Z

Hello @pudepiedj , I copied the exact full arguments that are used and traced back once you run ./webui.sh - it prints out what are the actual arguments it was launched with so that you can verify that it really uses what you think you have set up.

there are several ways to set them up or overide them depending on your preferences. In my example for testing purposes I have export COMMANDLINE_ARGS="--skip-torch-cuda-test"in the file webuis-user.sh and I add the other arguments on launch like this ./webui.sh --upcast-sampling --no-half-vae --no-half --opt-split-attention-v1 --lowvram --use-cpu interrogate

Thank you! It works on my M2 Max device. It uses GPU instead of CPU.

ohmygenie · 2023-07-04T14:19:31Z

Hello, I have been trying to build a simple python GUI using tkinter for stable diffusion. I am always hitting the same issue since I'm using M1 mac. Here's my code, I tried adding the --skip-torch-cuda-test directly in my .py code but it's not working, please help.

Error: RuntimeError: MPS backend out of memory (MPS allocated: 16.46 GB, other allocations: 1.98 GB, max allowed: 18.13 GB). Tried to allocate 1024.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

import os
from diffusers import StableDiffusionPipeline

Set environment variables

os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.9"
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

Set command line arguments

os.environ["COMMANDLINE_ARGS"] = "--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

SDV5_MODEL_PATH = "/Users/user/stable-diffusion-v1-5/"
SAVE_PATH = os.path.join(os.environ['HOME'], "Desktop", "SDV5_OUTPUT")

if not os.path.exists(SAVE_PATH):
os.mkdir(SAVE_PATH)

def uniquify(path):
filename, extension = os.path.splitext(path)
counter = 1

while os.path.exists(path):
    path = filename + " (" + str(counter) + ")" + extension
    counter += 1

return path

prompt = "A dog rising in motorcycle"

print(f"Characters in prompt: {len(prompt)}, limit: 200")

pipe = StableDiffusionPipeline.from_pretrained(SDV5_MODEL_PATH)
pipe = pipe.to("mps")

output = pipe(prompt)

Use the images attribute to access the generated images

image = output.images[0] # Adjusted this line based on your findings

Save the image

image_path = uniquify(os.path.join(SAVE_PATH, (prompt[:25] + "...") if len(prompt) > 25 else prompt) + ".png")
print(image_path)

image.save(image_path)

@pudepiedj @branksypop @honzajavorek

tmm1 · 2023-07-04T19:56:36Z

the default values can be seen in the source code:

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.mm#L25-L35

  static const char* high_watermark_ratio_str = getenv("PYTORCH_MPS_HIGH_WATERMARK_RATIO");
  const double high_watermark_ratio =
      high_watermark_ratio_str ? strtod(high_watermark_ratio_str, nullptr) : default_high_watermark_ratio;
  setHighWatermarkRatio(high_watermark_ratio);

  const double default_low_watermark_ratio =
      m_device.hasUnifiedMemory ? default_low_watermark_ratio_unified : default_low_watermark_ratio_discrete;
  static const char* low_watermark_ratio_str = getenv("PYTORCH_MPS_LOW_WATERMARK_RATIO");
  const double low_watermark_ratio =
      low_watermark_ratio_str ? strtod(low_watermark_ratio_str, nullptr) : default_low_watermark_ratio;
  setLowWatermarkRatio(low_watermark_ratio);

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L299-L306

  // (see m_high_watermark_ratio for description)
  constexpr static double default_high_watermark_ratio = 1.7;
  // we set the allowed upper bound to twice the size of recommendedMaxWorkingSetSize.
  constexpr static double default_high_watermark_upper_bound = 2.0;
  // (see m_low_watermark_ratio for description)
  // on unified memory, we could allocate beyond the recommendedMaxWorkingSetSize
  constexpr static double default_low_watermark_ratio_unified  = 1.4;
  constexpr static double default_low_watermark_ratio_discrete = 1.0;

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L326-L332

  // high watermark ratio is a hard limit for the total allowed allocations
  // 0. : disables high watermark limit (may cause system failure if system-wide OOM occurs)
  // 1. : recommended maximum allocation size (i.e., device.recommendedMaxWorkingSetSize)
  // >1.: allows limits beyond the device.recommendedMaxWorkingSetSize
  // e.g., value 0.95 means we allocate up to 95% of recommended maximum
  // allocation size; beyond that, the allocations would fail with OOM error.
  double m_high_watermark_ratio;

ohmygenie · 2023-07-05T02:11:17Z

@pudepiedj @branksypop @honzajavorek

the default values can be seen in the source code:

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.mm#L25-L35

  static const char* high_watermark_ratio_str = getenv("PYTORCH_MPS_HIGH_WATERMARK_RATIO");
  const double high_watermark_ratio =
      high_watermark_ratio_str ? strtod(high_watermark_ratio_str, nullptr) : default_high_watermark_ratio;
  setHighWatermarkRatio(high_watermark_ratio);

  const double default_low_watermark_ratio =
      m_device.hasUnifiedMemory ? default_low_watermark_ratio_unified : default_low_watermark_ratio_discrete;
  static const char* low_watermark_ratio_str = getenv("PYTORCH_MPS_LOW_WATERMARK_RATIO");
  const double low_watermark_ratio =
      low_watermark_ratio_str ? strtod(low_watermark_ratio_str, nullptr) : default_low_watermark_ratio;
  setLowWatermarkRatio(low_watermark_ratio);

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L299-L306

  // (see m_high_watermark_ratio for description)
  constexpr static double default_high_watermark_ratio = 1.7;
  // we set the allowed upper bound to twice the size of recommendedMaxWorkingSetSize.
  constexpr static double default_high_watermark_upper_bound = 2.0;
  // (see m_low_watermark_ratio for description)
  // on unified memory, we could allocate beyond the recommendedMaxWorkingSetSize
  constexpr static double default_low_watermark_ratio_unified  = 1.4;
  constexpr static double default_low_watermark_ratio_discrete = 1.0;

https://github.com/pytorch/pytorch/blob/bfd995f0d6bf87262613b5e89d871832ca9e9938/aten/src/ATen/mps/MPSAllocator.h#L326-L332

  // high watermark ratio is a hard limit for the total allowed allocations
  // 0. : disables high watermark limit (may cause system failure if system-wide OOM occurs)
  // 1. : recommended maximum allocation size (i.e., device.recommendedMaxWorkingSetSize)
  // >1.: allows limits beyond the device.recommendedMaxWorkingSetSize
  // e.g., value 0.95 means we allocate up to 95% of recommended maximum
  // allocation size; beyond that, the allocations would fail with OOM error.
  double m_high_watermark_ratio;

Thanks, apparently, my torch installation at M1 was having a problem. I've reinstalled it and it's now working. Now, I received a new error:

NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

--> Essentially here's what's happening for Apple silicon user: Option #1: GPU (not possible), Option #2: CPU (I tried it, takes 30 minutes to generate 1 picture), Option #3: MPS -> But I have this new error above. Option #4: Try to use AUTOMATIC1111 which impressively generates 1 picture for only 20 seconds; however, it's not customisable, say if you want to build something like that as a project for a client.

So yeah, it's the painful situation for Apple silicon users wanting to build an AI program using SD from scratch.

BewhY08 · 2023-07-07T05:24:04Z

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

关于设置，您webui-user.sh也可以将环境变量添加到您的环境变量中。这就是我现在的样子：

#!/bin/bash
#########################################################
# Uncomment and change the variables below to your need:#
#########################################################

# Install directory without trailing slash
#install_dir="/home/$(whoami)"

# Name of the subdirectory
#clone_dir="stable-diffusion-webui"

# PyTorch settings
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

# python3 executable
#python_cmd="python3"

... file continues unchanged ...

然后，运行 Web UI 所需的一切都很简单./webui.sh，一切都会自动应用。

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

ohmygenie · 2023-07-07T05:30:36Z

Replacing this code will allow you to map it, but the ControlNet functionality will not work properly
关于设置，您webui-user.sh也可以将环境变量添加到您的环境变量中。这就是我现在的样子：
#!/bin/bash
#########################################################
# Uncomment and change the variables below to your need:#
#########################################################

# Install directory without trailing slash
#install_dir="/home/$(whoami)"

# Name of the subdirectory
#clone_dir="stable-diffusion-webui"

# PyTorch settings
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --no-half --opt-sub-quad-attention --use-cpu interrogate"

# python3 executable
#python_cmd="python3"

... file continues unchanged ...
然后，运行 Web UI 所需的一切都很简单./webui.sh，一切都会自动应用。
Replacing this code will allow you to map it, but the ControlNet functionality will not work properly

Thanks, I presume this answer is for AUTOMATIC1111 users, correct? This won't be applicable for those who are building their customised program using stable diffusion, from scratch, as all of the dependencies will need to be done. Editing webui.sh is not applicable for this scenario.

Looking forward from someone who was able to run stable diffusion successfully in their Apple silicon machines using MPS (not CPU) with their own customised program.

luluaidota · 2023-07-11T08:59:43Z

I run this in 13.4.1 but also have the same problem

thedoger82 · 2023-07-12T16:25:28Z

For me the problem was the canvas size (1280x720), so i used something smaller (640x320) and i got no more mps problems, in case you need higher resolutions, create your images/videos with small resolutions and then use Topaz another AI which will do the job of increasing size and quality

efeLongoria · 2023-07-19T04:41:03Z

Hello my error is basically the same "RuntimeError: MPS backend out of memory" I tried several of the methods mentioned here and unfortunately I had no success, to be very specific I could not use the "Hires. fix" option, the process was always interrupted by this error, so I could not make images of format greater than 768x768.

Today in the morning with the help of ChatGPT4 I could solve the bug and I leave how I could solve it, it comes in a very synthesized way I hope it is useful.

install miniconda (if you already have it, skip this step)

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh

Create a new virtual environment with Conda specifying Python 3.9. Copy and paste the following command into the terminal:

conda create --name "your name" python=3.9

Activate the virtual environment. Copy and paste the following command into the terminal:

conda activate "your name previously"

Install PyTorch in the virtual environment. Copy and paste the following command into the terminal:

conda install pytorch torchvision torchaudio -c pytorch-nightly

This step is used to check if MPS (Metal Performance Shaders) is available and, if so, it creates a tensor on the MPS device and prints it. It is a way to validate that everything is working correctly.
Create a Python file (for example, mps_test.py) with the following code to test the MPS device. You can do this using any text editor, then save the file with the .py extension:

import torch
if torch.backends.mps.is_available():
    device = torch.device('mps')
    x = torch.ones(1, device=device)
    print(x)
else:
    print("MPS device not found.")

Run the Python file. Copy and paste the following command into the terminal:

python mps_test.py

tensor([1.], device='mps:0') This has to be your result in order to work smoothly. If you are experiencing memory problems with the MPS backend, you can adjust the proportion of memory PyTorch is allowed to use.
0.0: Disables the upper limit for memory allocations. This means that PyTorch will try to use as much GPU memory as necessary.
Values between 0 and 1: These values represent the fraction of the total GPU memory that PyTorch is allowed to use. For example, if the value is 0.5 on a GPU with 8GB of memory, PyTorch will try to use no more than 4GB.
Values greater than 1: are meaningless in this context and will probably cause unwanted behavior or errors.

Finally copy and paste the following command into the terminal:

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

At this point, the only thing left to do is to start the UI ./webui.sh

ohmygenie · 2023-07-19T04:49:43Z

Hello, same steps I did but using Anaconda. The best time I have in CPU is 6 minutes (for a single image) with an entry level MBP M1 Pro (2021).

Are you able to successfully generate an image from your customised program (not AUTOMATIC1111) without encountering the error? If yes, feel free to share the code or tweaks you made.

Essentially, the underlying issue is you can use AUTOMATIC1111 and generate all the images you want with MPS because it has made a lot of changes in the backend with embeddings, etc. so no issues on that.

Problem starts if you create your own python program (not AUTOMATIC1111) with stable diffusion and generate an image, it will always prompt that error about MPS. Workaround is changing it to CPU, or resort to using a device with a GPU/CUDA like a windows laptop or PC.

ohmygenie · 2023-07-19T04:52:24Z

Hello my error is basically the same "RuntimeError: MPS backend out of memory" I tried several of the methods mentioned here and unfortunately I had no success, to be very specific I could not use the "Hires. fix" option, the process was always interrupted by this error, so I could not make images of format greater than 768x768.

Today in the morning with the help of ChatGPT4 I could solve the bug and I leave how I could solve it, it comes in a very synthesized way I hope it is useful.

install miniconda (if you already have it, skip this step)
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh
Create a new virtual environment with Conda specifying Python 3.9. Copy and paste the following command into the terminal:
conda create --name "your name" python=3.9
Activate the virtual environment. Copy and paste the following command into the terminal:
conda activate "your name previously"
Install PyTorch in the virtual environment. Copy and paste the following command into the terminal:
conda install pytorch torchvision torchaudio -c pytorch-nightly
This step is used to check if MPS (Metal Performance Shaders) is available and, if so, it creates a tensor on the MPS device and prints it. It is a way to validate that everything is working correctly. Create a Python file (for example, mps_test.py) with the following code to test the MPS device. You can do this using any text editor, then save the file with the .py extension:
import torch
if torch.backends.mps.is_available():
    device = torch.device('mps')
    x = torch.ones(1, device=device)
    print(x)
else:
    print("MPS device not found.")
Run the Python file. Copy and paste the following command into the terminal:
python mps_test.py



If you are experiencing memory problems with the MPS backend, you can adjust the ratio of memory that PyTorch is allowed to use. To prevent PyTorch from using memory beyond the capacity of the GPU, you can set the ratio to 0.0. Copy and paste the following command into the terminal:

tensor([1.], device='mps:0') <--- that's the result from my machine which means MPS is activated.

Looking forward if someone would like to share their code if they are able to successfully generate an image with MPS as device in an apple silicon machine.

efeLongoria · 2023-07-19T05:15:47Z

Hello my error is basically the same "RuntimeError: MPS backend out of memory" I tried several of the methods mentioned here and unfortunately I had no success, to be very specific I could not use the "Hires. fix" option, the process was always interrupted by this error, so I could not make images of format greater than 768x768.
Today in the morning with the help of ChatGPT4 I could solve the bug and I leave how I could solve it, it comes in a very synthesized way I hope it is useful.
install miniconda (if you already have it, skip this step)
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh
Create a new virtual environment with Conda specifying Python 3.9. Copy and paste the following command into the terminal:
conda create --name "your name" python=3.9
Activate the virtual environment. Copy and paste the following command into the terminal:
conda activate "your name previously"
Install PyTorch in the virtual environment. Copy and paste the following command into the terminal:
conda install pytorch torchvision torchaudio -c pytorch-nightly
This step is used to check if MPS (Metal Performance Shaders) is available and, if so, it creates a tensor on the MPS device and prints it. It is a way to validate that everything is working correctly. Create a Python file (for example, mps_test.py) with the following code to test the MPS device. You can do this using any text editor, then save the file with the .py extension:
import torch
if torch.backends.mps.is_available():
    device = torch.device('mps')
    x = torch.ones(1, device=device)
    print(x)
else:
    print("MPS device not found.")
Run the Python file. Copy and paste the following command into the terminal:
python mps_test.py



If you are experiencing memory problems with the MPS backend, you can adjust the ratio of memory that PyTorch is allowed to use. To prevent PyTorch from using memory beyond the capacity of the GPU, you can set the ratio to 0.0. Copy and paste the following command into the terminal:
tensor([1.], device='mps:0') <--- that's the result from my machine which means MPS is activated.

Looking forward if someone would like to share their code if they are able to successfully generate an image with MPS as device in an apple silicon machine.

I'm sorry it was not helpful, after several days this worked for me, I will try to test more variables to see if I can find another alternative.

ealkanat · 2023-08-01T15:35:59Z

I run this in 13.5, same problem.

2.3 GHz 8-Core Intel Core i9
AMD Radeon Pro 5500M 4 GB

efeLongoria · 2023-08-01T18:19:08Z

I run this in 13.5, same problem.

2.3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 4 GB

you have already tried this? "https://developer.apple.com/metal/pytorch/".

and this?
COMMANDLINE_ARGS="--lowvram --opt-split-attention"

fxbeaulieu · 2023-08-01T18:27:27Z

add PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 in the command you use to start WebUI.
Example :
cd '/Users/fxbeaulieu/stable-diffusion-webui';PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./webui.sh --autolaunch;exit

ealkanat · 2023-08-02T00:20:55Z

I run this in 13.5, same problem.

2.3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 4 GB

add PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 in the command you use to start WebUI. Example : cd '/Users/fxbeaulieu/stable-diffusion-webui';PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./webui.sh --autolaunch;exit

Sorry guys this was my mistake. I found different torch versions on my machine.

Deleted all venv folders under base directory (.venv, venv).
Then I installed Torch nightly version again.
It's fixed!

Based on comment from AUTOMATIC1111#9133 (comment) Using GPU is slower for some reason and lags my computer

injelee21 · 2023-10-09T22:55:02Z

Thank you. @efeLongoria I was able to produce same out put tensor([1.], device='mps:0' however I am still encountering the same issue. MPS backend out of memory (MPS allocated: 6.50 GB, other allocations: 29.72 GB, max allowed: 36.27 GB). Tried to allocate 128.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure). I have been reading all the comments and some people did fix it by PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half. What is ./webui.sh? where should I download certain file? By the way I am using M2 with 32GB

SamKhoze · 2023-12-04T19:34:56Z

I had the same problem with Comfyui running vid2vid and received this error:
RuntimeError: MPS backend out of memory (MPS allocated: 10.74 GB, other allocations: 23.29 GB, max allowed: 36.27 GB). Tried to allocate 2.25 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

I fixed it by rebooting comfyui via this command:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python main.py

raffetazarius · 2024-01-01T23:44:55Z

I have a hunch that GPU VRAM may not be getting flushed correctly by A1111 after generations when running on MacOS installations leveraging PyTorch and MPS, since I'm seeing VRAM usage increase after each consecutive image generation (Intel Mac Pro with AMD GPU) until between gen 5 and 10 I get the "MPS Backend out of memory" error, forcing me to restart SD Web UI to complete more generations.

To any engineer looking to fix this in the A1111 codebase, this article may be useful:

https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530

particularly this comment https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530/27

also https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265?u=cedric

raffetazarius · 2024-01-03T16:23:35Z

With @viking1304's help, I've tested a1111 with PyTorch 2.3.0.dev20240103 today on my aforementioned Mac Pro 2019 Intel + AMD 6900XT GPU rig and am no longer getting this MPS Out of Memory error! Yay!

Installed latest PyTorch dev version using viking1304's A1111 installer - https://github.com/viking1304/a1111-setup

mykolaienko21 · 2024-04-20T21:37:17Z

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half

I have this problem in terminal after this command. help me to solve it(

(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
-bash: ./webui.sh: No such file or directory

satvik-1945 · 2024-05-31T08:45:52Z

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
I have this problem in terminal after this command. help me to solve it(

(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory

I am also facing the same problem, where to put these lines of code

efeLongoria · 2024-06-01T08:12:55Z

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
I have this problem in terminal after this command. help me to solve it(
(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory
I am also facing the same problem, where to put these lines of code

In the terminal, you need RUN the SD whit that command

SudhanshuBlaze · 2024-07-18T09:13:04Z

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
I have this problem in terminal after this command. help me to solve it(
(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory
I am also facing the same problem, where to put these lines of code
In the terminal, you need RUN the SD whit that command

What do you mean by SD?

thedoger82 · 2024-07-19T18:15:44Z

This seems to help, at least in my case:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half
I have this problem in terminal after this command. help me to solve it(
(base) MacBook-Pro-2:~ aleksendrmykolaienko$ PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7 ./webui.sh --precision full --no-half -bash: ./webui.sh: No such file or directory
I am also facing the same problem, where to put these lines of code
In the terminal, you need RUN the SD whit that command
What do you mean by SD?

Stable Diffusion

wasimsafdar · 2024-10-15T07:07:19Z

I am facing a similar issue for llama32. I am using "Llama-3.2-3B-Instruct" in Pytorch. I have Mac M1 Pro, 16 GB.

MozzieD · 2025-01-19T14:56:21Z

Try using Chrome

fangyinzhe added the bug-report Report of a bug, yet to be confirmed label Mar 29, 2023

dilwong added a commit to dilwong/stable-diffusion-webui that referenced this issue Aug 9, 2023

Working on Intel Mac but with CPU

1fffa41

Based on comment from AUTOMATIC1111#9133 (comment) Using GPU is slower for some reason and lags my computer

basavyr mentioned this issue Sep 8, 2024

Add supervised fine-tuning capabilities for GPT2 basavyr/ml-playground#1

Merged

MPS backend out of memory #9133

MPS backend out of memory #9133

Comments

fangyinzhe commented Mar 29, 2023

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

elisezhu123 commented Mar 29, 2023

fangyinzhe commented Mar 29, 2023

pudepiedj commented Apr 7, 2023

fangyinzhe commented Apr 8, 2023

pudepiedj commented Apr 9, 2023

elisezhu123 commented Apr 18, 2023

GrinZero commented Apr 29, 2023

vanilladucky commented May 3, 2023

stephanebdc commented May 7, 2023

honzajavorek commented May 11, 2023 • edited Loading

akamitoro commented May 12, 2023

akamitoro commented May 12, 2023

honzajavorek commented May 12, 2023

rovo79 commented May 12, 2023

pudepiedj commented May 13, 2023 via email

honzajavorek commented May 13, 2023 via email

pudepiedj commented May 13, 2023 via email

vanilladucky commented May 13, 2023

honzajavorek commented May 13, 2023

dlebouc commented May 13, 2023

BrjGit commented May 13, 2023

dlebouc commented May 13, 2023

BrjGit commented May 13, 2023

akamitoro commented May 13, 2023

pudepiedj commented May 14, 2023 via email

pudepiedj commented May 14, 2023 via email

honzajavorek commented May 15, 2023

honzajavorek commented May 15, 2023 • edited Loading

pudepiedj commented May 15, 2023 via email

shamshhoda commented Jun 19, 2023

Chase-Xuu commented Jul 3, 2023

ohmygenie commented Jul 4, 2023 • edited Loading

Set environment variables

Set command line arguments

Use the images attribute to access the generated images

Save the image

tmm1 commented Jul 4, 2023 • edited Loading

ohmygenie commented Jul 5, 2023

BewhY08 commented Jul 7, 2023

ohmygenie commented Jul 7, 2023

luluaidota commented Jul 11, 2023

thedoger82 commented Jul 12, 2023

efeLongoria commented Jul 19, 2023 • edited Loading

ohmygenie commented Jul 19, 2023

ohmygenie commented Jul 19, 2023 • edited Loading

efeLongoria commented Jul 19, 2023

ealkanat commented Aug 1, 2023

efeLongoria commented Aug 1, 2023

fxbeaulieu commented Aug 1, 2023

ealkanat commented Aug 2, 2023 • edited Loading

injelee21 commented Oct 9, 2023 • edited Loading

SamKhoze commented Dec 4, 2023

raffetazarius commented Jan 1, 2024 • edited Loading

raffetazarius commented Jan 3, 2024

mykolaienko21 commented Apr 20, 2024

satvik-1945 commented May 31, 2024

efeLongoria commented Jun 1, 2024 • edited Loading

SudhanshuBlaze commented Jul 18, 2024

thedoger82 commented Jul 19, 2024

wasimsafdar commented Oct 15, 2024

MozzieD commented Jan 19, 2025

honzajavorek commented May 11, 2023 •

edited

Loading

honzajavorek commented May 15, 2023 •

edited

Loading

ohmygenie commented Jul 4, 2023 •

edited

Loading

tmm1 commented Jul 4, 2023 •

edited

Loading

efeLongoria commented Jul 19, 2023 •

edited

Loading

ohmygenie commented Jul 19, 2023 •

edited

Loading

ealkanat commented Aug 2, 2023 •

edited

Loading

injelee21 commented Oct 9, 2023 •

edited

Loading

raffetazarius commented Jan 1, 2024 •

edited

Loading

efeLongoria commented Jun 1, 2024 •

edited

Loading