Apple MPS Support #179

magnusviri · 2022-08-29T04:24:45Z

These are all of the changes from my repo that gets this working on Apple Mac's with MPS (Apple Silicon and Intel). I am aware of at least one problem with the scripts that are new in lstein, but I'm hoping I can get those fixed in time. The main scripts old original txt2img and img2img work.

cvakiitho · 2022-08-29T12:31:41Z

I can confirm txt2img works on my m1, for some reason pip fails installing from requirements, with No matching distribution found for opencv-python==4.1.2.30 , but works anyway.

dream.py fails with cuda calls, as expected.

Anyway, thanks.

magnusviri · 2022-08-29T13:38:57Z

If merging to main is too disruptive, could you make a branch for apple-mps-support? I can resubmit the pull request to that branch.

Someone said this works on Intel using MPS, that's why I dropped "silicon" from the other branch name. It's basically Apple's MPS support, not Apple Silicon support.

magnusviri · 2022-08-29T13:39:50Z

I'll work on dream.py maybe later today, like when I'm done with work in 8 hours or something. Probably later so I can eat lol.

environment-mac.yaml

ldm/modules/attention.py

ldm/modules/encoders/modules.py

scripts/orig_scripts/txt2img.py

lstein · 2022-08-29T18:16:55Z

I do not have a MPS system to test on. I'm reviewing the code now, so could I get confirmation from someone (other than the pull author) that it is working on the appropriate Apple silicon? I will do some timing tests to confirm that there is no performance hit on CUDA systems.

lstein · 2022-08-29T18:31:58Z

The PR merges cleanly and does not have a measurable effect on either model loading time or image generation time when running on CUDA Linux. I did five timings and could not detect a statistically significant difference. If there is an effect it is no more than a fraction of a second.

Oceanswave · 2022-08-29T18:48:41Z

Testing now

./scripts/orig_scripts/txt2img.py works beautifully, nice!

dream.py still needs some work. OP indicated that he's going to work on it. So we probably should either update the README.md to make that just a little more obvious to eliminate confusion or wait until OP gets a chance.

lstein · 2022-08-29T19:45:44Z

dream.py still needs some work. OP indicated that he's going to work on it. So we probably should either update the README.md to make that just a little more obvious to eliminate confusion or wait until OP gets a chance.

Sorry, don't recognize the "OP" acronym. Who is this? I would much prefer to wait until the whole package (dream.py, webUI, and the method calls) all work on MPS before rolling this out. There will be jubilation in the streets when SD comes to the M1 chips.

Up above, @Birch-san requested that we turn this into a public feature branch, and I'm happy to do so as a marker for what is coming. Any objections?

Oceanswave · 2022-08-29T19:47:47Z

Sorry, don't recognize the "OP" acronym. Who is this? I would much prefer to wait until the whole package (dream.py, webUI, and the method calls) all work on MPS before rolling this out. There will be jubilation in the streets when SD comes to the M1 chips.

Apologies, meaning magnusviri

Up above, @Birch-san requested that we turn this into a public feature branch, and I'm happy to do so as a marker for what is coming. Any objections?

sounds like a plan

lstein · 2022-08-29T20:56:19Z

New public branch magnusviri-apple-mps-support is now available for people to work on.

ldm/models/diffusion/ddim.py

michaelrhanson · 2022-08-30T00:45:48Z

Just a data point: starting with commit 8b9f2c7, was able to compile and run txt2img on a MB Pro (M1 chip, 16 GB) running 12.5.1. From a fairly stock configuration with XCode tools installed, had to install Rust and Anaconda, update pip, tweak the PYTORCH_ENABLE_MPS_FALLBACK env variable, and symlink openai/clip-vit-large-patch14. With default settings txt2img.py completed 2 iterations in 43:33.

magnusviri · 2022-08-30T05:20:14Z

I was actually hoping for a feature branch. However, this has now exceeded my GitHub experience. I've never worked on a group project like this before. I'm actually not sure how to proceed at all now. Now that the lstein branch is the upstream branch I want to work on getting things like dream.py working, but actually first I want to get other Mac users to switch from the main repo to this repo so that hopefully all Mac users are working on the same thing.

sulkaharo · 2022-08-30T06:34:41Z

Regarding ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckp - where's the package to load these models?

sulkaharo · 2022-08-30T07:06:19Z

Right, found the model. Dream.py now throws the following when run:

Traceback (most recent call last):
  File "/Users/sulka/dev/stable-diffusion/scripts/dream.py", line 488, in <module>
    main()
  File "/Users/sulka/dev/stable-diffusion/scripts/dream.py", line 84, in main
    t2i.load_model()
  File "/Users/sulka/dev/stable-diffusion/ldm/simplet2i.py", line 548, in load_model
    model = self._load_model_from_config(config, self.weights)
  File "/Users/sulka/dev/stable-diffusion/ldm/simplet2i.py", line 600, in _load_model_from_config
    model = instantiate_from_config(config.model)
  File "/Users/sulka/dev/stable-diffusion/ldm/util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config['target'])(
  File "/Users/sulka/dev/stable-diffusion/ldm/models/diffusion/ddpm.py", line 657, in __init__
    self.instantiate_cond_stage(cond_stage_config)
  File "/Users/sulka/dev/stable-diffusion/ldm/models/diffusion/ddpm.py", line 768, in instantiate_cond_stage
    model = instantiate_from_config(config)
  File "/Users/sulka/dev/stable-diffusion/ldm/util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config['target'])(
  File "/Users/sulka/dev/stable-diffusion/ldm/modules/encoders/modules.py", line 253, in __init__
    self.tokenizer = CLIPTokenizer.from_pretrained(
  File "/Users/sulka/opt/anaconda3/envs/ldm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1768, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'openai/clip-vit-large-patch14' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

cvakiitho · 2022-08-30T07:14:31Z

Just a data point: starting with commit 8b9f2c7, was able to compile and run txt2img on a MB Pro (M1 chip, 16 GB) running 12.5.1. From a fairly stock configuration with XCode tools installed, had to install Rust and Anaconda, update pip, tweak the PYTORCH_ENABLE_MPS_FALLBACK env variable, and symlink openai/clip-vit-large-patch14. With default settings txt2img.py completed 2 iterations in 43:33.

As in 43 minutes? On my m1 max txt2image with --n_iter 2 takes ~ 3 minutes

Right, found the model. Dream.py now throws the following when run:

for me I had to use the sd-v1-4.ckpt directly, and just directly copy it to models/ldm/stable-diffusion-v1/model.ckp

Vargol · 2022-08-30T09:09:11Z

Can some kind people one check if the seed issue is still occurring, that not using fixed_code but using a seed
the images are consistant between runs, but are not consistent between users.

[EDIT: Actually I see this repo doesn't have the fix for seed in, just the fix for fixed_code,
see https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1230075917
basically its the same fix as the fixed_code change use the CPU randn calls for MPS
in the diffusion scripts.]

Not sure if this is a MPS or general thing, but only MPS users have complained so far.

There's a work around to move the seed_everything after the model load,
see CompVis/stable-diffusion#25 (comment)
and the follow on for that message.

I'm jut a DBA with no pytorch skills so I've no idea why the model load would affect the random number generator or why the affect would be different for different users.

I assume hoping for randn to come up with the same order of random values on different architectures would be a bit of a pipe dream ?

sclausen · 2022-08-30T12:15:09Z

I get an error with the tokenizer when trying to run python scripts/dream.py. I don't know why the tokenizer isn't compiled for the right architecture 🤷‍♂️ has anyone encountered such an error or the expertise to help me out what I may have gotten wrong?

python scripts/dream.py
* Initializing, be patient...

Traceback (most recent call last):
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 26, in <module>
    from .tokenization_utils_base import (
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 73, in <module>
    from tokenizers import AddedToken
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/__init__.py", line 79, in <module>
    from .tokenizers import (
ImportError: dlopen(/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/usr/local/opt/openssl/lib/tokenizers.cpython-310-darwin.so' (no such file), '/tokenizers.cpython-310-darwin.so' (no such file), '/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/models/__init__.py", line 19, in <module>
    from . import (
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/models/layoutlm/__init__.py", line 28, in <module>
    from .configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/models/layoutlm/configuration_layoutlm.py", line 19, in <module>
    from transformers import PretrainedConfig, PreTrainedTokenizer, TensorType
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 874, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.tokenization_utils because of the following error (look up to see its traceback):
dlopen(/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/usr/local/opt/openssl/lib/tokenizers.cpython-310-darwin.so' (no such file), '/tokenizers.cpython-310-darwin.so' (no such file), '/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/username/git/lstein/stable-diffusion/scripts/dream.py", line 469, in <module>
    main()
  File "/Users/username/git/lstein/stable-diffusion/scripts/dream.py", line 34, in main
    from pytorch_lightning import logging
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
    from torchmetrics import Accuracy as _Accuracy
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/torchmetrics/functional/__init__.py", line 68, in <module>
    from torchmetrics.functional.text.bert import bert_score
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/torchmetrics/functional/text/bert.py", line 28, in <module>
    from transformers import AutoModel, AutoTokenizer
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 874, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback):
Failed to import transformers.tokenization_utils because of the following error (look up to see its traceback):
dlopen(/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/usr/local/opt/openssl/lib/tokenizers.cpython-310-darwin.so' (no such file), '/tokenizers.cpython-310-darwin.so' (no such file), '/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

HenkPoley · 2022-08-30T12:29:00Z

See this error on M1 16GB, when running a prompt under python ./scripts/dream.py (Edit: might be expected: #179 (comment)):

>> Setting Sampler to k_heun
Traceback (most recent call last):
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/./scripts/dream.py", line 488, in <module>
    main()
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/./scripts/dream.py", line 98, in main
    main_loop(t2i, opt.outdir, cmd_parser, infile)
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/./scripts/dream.py", line 184, in main_loop
    image_list  = t2i.prompt2image(image_callback=callback, **vars(opt))
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/ldm/simplet2i.py", line 287, in prompt2image
    torch.cuda.torch.cuda.reset_peak_memory_stats()
  File "/Users/henk/miniconda3/envs/ldm-2/lib/python3.10/site-packages/torch/cuda/memory.py", line 256, in reset_peak_memory_stats
    return torch._C._cuda_resetPeakMemoryStats(device)
AttributeError: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'

This error has nothing to do with the 'k_heun' sampler.

bcondron-square · 2022-08-31T00:34:17Z

@lstein I believe the new module (ldm.dream.devices) seems to be missing from the repo presently

scripts/orig_scripts/txt2img.py

beaugunderson · 2022-08-31T00:51:28Z

I do not have a MPS system to test on. I'm reviewing the code now, so could I get confirmation from someone (other than the pull author) that it is working on the appropriate Apple silicon?

With the addition of my suggested fix I was able to reproduce this example from magnusviri's original fork:

python scripts/orig_scripts/txt2img.py \
  --prompt "Anubis riding a motorbike in Grand Theft Auto cover, palm trees, cover art by Stephen Bliss, artstation, high quality" \
  --ddim_steps=50 \
  --n_samples=1 \
  --n_rows=1 \
  --n_iter=1 \
  --seed 1805504473 \
  --fixed_code

danwigrizer · 2022-08-31T01:38:07Z

Wow. Lots of code changes! I couldn't stand all the repeated code for choosing the torch device so I folded everything into a shared function called choose_torch_device() and stuck it into a small module named ldm.dream.devices.

@lstein I don't believe the ldm.dream.devices module is available. Doesn't appear this module was made. Correct me if I'm wrong.

mc0 · 2022-08-31T02:21:16Z

It looks like that file (ldm/dream/devices.py) should be:

import torch

def choose_torch_device():
    if torch.cuda.is_available():
        return 'cuda'
    elif torch.backends.mps.is_available():
        return 'mps'
    return 'cpu'

lstein · 2022-08-31T03:35:24Z

Wow. Lots of code changes! I couldn't stand all the repeated code for choosing the torch device so I folded everything into a shared function called choose_torch_device() and stuck it into a small module named ldm.dream.devices.

@lstein I don't believe the ldm.dream.devices module is available. Doesn't appear this module was made. Correct me if I'm wrong.

My bad. Forgot to push.

Make --fixed-code work again Co-authored-by: Beau Gunderson <[email protected]>

mc0 · 2022-08-31T03:50:08Z

I'm having issues with this branch working (where magnusviri's fork was working).

/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

My bad. Forgot to push.

Maybe it just didn't get added with git add?

lstein

I've checked that dream.py works with both txt2img and img2img functionality, and that images produced by previous versions are the same now when using the original seed. Not surprisingly, images produced on macs are different, even when same seed is used.

lstein · 2022-08-31T04:29:37Z

@Birch-san I tried replacing my fork of k-diffusion with yours, and unfortunately the result was that dream.py crashed with:

Traceback (most recent call last):
  File "scripts/dream.py", line 488, in <module>
    main()
  File "scripts/dream.py", line 35, in main
    from ldm.simplet2i import T2I
  File "/u/lstein/projects/SD/stable-diffusion/ldm/simplet2i.py", line 28, in <module>
    from ldm.models.diffusion.ksampler import KSampler
  File "/u/lstein/projects/SD/stable-diffusion/ldm/models/diffusion/ksampler.py", line 2, in <module>
    import k_diffusion as K
  File "/u/lstein/projects/SD/stable-diffusion/src/k-diffusion/k_diffusion/__init__.py", line 1, in <module>
    from . import augmentation, config, evaluation, external, gns, layers, models, sampling, utils
  File "/u/lstein/projects/SD/stable-diffusion/src/k-diffusion/k_diffusion/external.py", line 6, in <module>
    from . import sampling, utils
  File "/u/lstein/projects/SD/stable-diffusion/src/k-diffusion/k_diffusion/sampling.py", line 10, in <module>
    from typing import Optional, Callable, TypeAlias
ImportError: cannot import name 'TypeAlias' from 'typing' (/u/lstein/.conda/envs/ldm/lib/python3.8/typing.py)

What version of typing is required?

lstein · 2022-08-31T04:39:09Z

I did a squash merge from the magnusviri-apple-mps-support branch. Everything went in cleanly. I will leave the branch up so that the commit chain can be followed.

This was a bear! Thank you everyone for the thorough conversations. Let me know if any assistance is required.

sclausen · 2022-08-31T06:36:52Z

@hemmer Thanks for the suggestion, but I can't install the osx-arm64 version, since it requires python <3.10.0a0.

$ conda install -c conda-forge tokenizers
[…]
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - tokenizers -> python[version='3.8.*|3.9.*']
  - tokenizers -> python[version='>=3.8,<3.9.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0',build=*_cpython]

Your python: python=3.10

[…]

hemmer · 2022-08-31T06:38:33Z

@sclausen I forgot to mention that I also downgraded to python 3.9 without issue ( in the yml)

sclausen · 2022-08-31T07:05:35Z

@hemmer I didn't tried that 🤦‍♂️ Thank you very much! What I ask myself: Why does it seem that I'm the only one (or one of few) that had this issue? Shouldn't this be a problem for every M1 user?

cvakiitho · 2022-08-31T07:18:49Z

btw there is another fork, running with GPU on m1, making it run a lot faster for me:
https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/

Actually maybe it's the export PYTORCH_ENABLE_MPS_FALLBACK=1 that I had that made it run slower?
... I need to do some more testing

EDIT2:

Yep it's the PYTORCH_ENABLE_MPS_FALLBACK=1 - if you had to turn it on, turn it of and run:
conda install pytorch -c pytorch-nightly

magnusviri · 2022-08-31T08:05:14Z

You guys moved so fast my head is still spinning. I'm trying to figure out everything that's just happened. I just replied here that we need to make one change regarding the mps support. I had made this on the repo on my hard drive but I haven't had time to push it. I'm not sure what the state of the code is anymore. I've never worked on a project where multiple people contribute, so I'm having a hard time understanding what all of the GitHub messages mean. I can't even keep up with all the messages!

cvakiitho · 2022-08-31T09:01:31Z

@hemmer I didn't tried that 🤦‍♂️ Thank you very much! What I ask myself: Why does it seem that I'm the only one (or one of few) that had this issue? Shouldn't this be a problem for every M1 user?

I believe everyone is using txt2img, and img2img, as it's the only currently supported on m1, as it's written in docs in this PR

magnusviri · 2022-08-31T09:17:24Z

This works! I've only tested txt2img.py so far.

One problem, it's really slow at 18.24s/it. Before I was getting somewhere between 4-8s/it (it kept changing as we made changes). Even with PYTORCH_ENABLE_MPS_FALLBACK=1 before it wasn't this slow, it was closer to 8-9s/it. I will test more tomorrow.

sulkaharo · 2022-08-31T09:23:13Z

@magnusviri I'm running a clean installation of the main branch from this repo using a MacBook Pro M1 Max with 24 GPU cores and getting an average of 2.23 s/it

cvakiitho · 2022-08-31T09:42:22Z

I was under 1.5 s/it with same M1 Max on the forked branch. Let me try fresh main and report back.

Edit:
Yes, I have the same speed on main branch.

magnusviri · 2022-08-31T10:00:12Z

Ugh! I forgot to set n_samples to 1! Yeah, on my M1 MacBook Air I'm now getting 4.39s/it. This might even be faster than before. False alarm!

junukwon7 · 2022-08-31T12:11:23Z

I got the same error. I was able to make it stop using the diff (below) and the following command:

python3 scripts/dream.py --full_precision

However, all of my outputs are blank (black squares).

diff --git a/ldm/simplet2i.py b/ldm/simplet2i.py
index 7ec2757..024636d 100644
--- a/ldm/simplet2i.py
+++ b/ldm/simplet2i.py
@@ -284,7 +284,7 @@ class T2I:
             self._set_sampler()
 
         tic = time.time()
-        torch.cuda.torch.cuda.reset_peak_memory_stats()
+        # torch.cuda.torch.cuda.reset_peak_memory_stats()
         results = list()
 
         try:
@@ -316,7 +316,7 @@ class T2I:
                     callback=step_callback,
                 )
 
-            with scope(self.device.type), self.model.ema_scope():
+            with nullcontext(self.device.type), self.model.ema_scope():
                 for n in trange(iterations, desc='Generating'):
                     seed_everything(seed)
                     iter_images = next(images_iterator)
@@ -367,6 +367,8 @@ class T2I:
                 'Partial results will be returned; if --grid was requested, nothing will be returned.'
             )
         except RuntimeError as e:
+            import traceback
+            traceback.print_exc()
             print(str(e))
             print('Are you sure your system has an adequate NVIDIA GPU?')
 
@@ -457,7 +459,7 @@ class T2I:
 
         init_image = self._load_img(init_img).to(self.device)
         init_image = repeat(init_image, '1 ... -> b ...', b=batch_size)
-        with precision_scope(self.device.type):
+        with nullcontext(self.device.type):
             init_latent = self.model.get_first_stage_encoding(
                 self.model.encode_first_stage(init_image)
             )  # move to latent space

@cgodley Thanks for your work. It works for me, however returns blank as you mentioned.

Did anyone figure out how to make deam.py functional in m.ps? Seems like the conversations are about both original and dream.py.

cgodley · 2022-08-31T19:36:06Z

@junukwon7 Yes python3 scripts/dream.py --full_precision worked for me with my patch (above) but you can't use all samplers yet.

k_lms gives me a blank square
ddim worked for me
Not sure about the other samplers

I think the reason for k-diffusion producing black squares is this issue:
pytorch/pytorch#84364

Edit: I was using https://github.com/magnusviri/stable-diffusion-lstein @ 84c10346fb777c827df58b264a726572225a45c6

Birch-san · 2022-08-31T19:52:03Z

@cgodley yes, I have a workaround to fix the k-diffusion samplers on MPS in my k-diffusion branch.

I recommended it as a change on this review a few days ago but it wasn't incorporated.

#179 (comment)

cgodley · 2022-08-31T21:16:42Z

@Birch-san I tried your k-diffusion fork but I'm getting very blurry images like this:

python scripts/orig_scripts/txt2img.py --n_samples 1 --n_iter 1 --skip_grid --prompt "an atronaut riding a horse" --seed 42 --scale 7.5 --n_iter 1 --ddim_steps 50 --outdir /Users/Shared/txt2img --klms

Same thing using DDIM:
python scripts/orig_scripts/txt2img.py --n_samples 1 --n_iter 1 --skip_grid --prompt "an atronaut riding a horse" --seed 42 --scale 7.5 --n_iter 1 --ddim_steps 50 --outdir /Users/Shared/txt2img

I'm using lstein/stable-diffusion@fix-crash-on-mps (fd2200e0a8ec4d628da43c4dd79e1ae73f01b14a) and I've checked out Birch-san/k-diffusion@mps (6e5c8a77edc62e75414ad850cb0a6f7ddceea0d4) in the src/k-diffusion directory.

Edit: I'm using MacBook Pro M1 14" 16GB, macOS 12.5

Birch-san · 2022-08-31T21:25:18Z

@cgodley thanks for trying it out… hm, I haven't tried it in the context of the lstein fork, but certainly when using the txt2img_fork.py from my own stable-diffusion branch: 50 steps of k_lms sampler using the default noise schedule works just fine:

I'm using the same commit 6e5c8a7 from my k-diffusion fork that you mentioned.

junukwon7 · 2022-09-01T02:35:57Z

@cgodley Thanks. #256 and #268 works well for me.

Apple MPS Support

7c95baa

This was referenced Aug 29, 2022

Instructions for setup and running on Mac Silicon chips CompVis/stable-diffusion#25

Open

Silicon Support - M1 #22

Closed

James Reynolds added 2 commits August 29, 2022 00:40

Fix --fixed_code (fixes seed)

d0b168d

Fix one more MPS crash

04456f3

Birch-san reviewed Aug 29, 2022

View reviewed changes

environment-mac.yaml Show resolved Hide resolved

Birch-san reviewed Aug 29, 2022

View reviewed changes

ldm/modules/attention.py Show resolved Hide resolved

Birch-san reviewed Aug 29, 2022

View reviewed changes

ldm/modules/encoders/modules.py Show resolved Hide resolved

Birch-san reviewed Aug 29, 2022

View reviewed changes

scripts/orig_scripts/txt2img.py Outdated Show resolved Hide resolved

lstein requested review from lstein, Oceanswave and Birch-san August 29, 2022 18:14

warner-benjamin reviewed Aug 29, 2022

View reviewed changes

ldm/models/diffusion/ddim.py Outdated Show resolved Hide resolved

beaugunderson reviewed Aug 31, 2022

View reviewed changes

scripts/orig_scripts/txt2img.py Outdated Show resolved Hide resolved

Update scripts/orig_scripts/txt2img.py

10619ae

Make --fixed-code work again Co-authored-by: Beau Gunderson <[email protected]>

lstein approved these changes Aug 31, 2022

View reviewed changes

lstein closed this Aug 31, 2022

sulkaharo mentioned this pull request Aug 31, 2022

MPS install missing tokeniser #233

Closed

junukwon7 mentioned this pull request Aug 31, 2022

dream.py issue on M1 Mac: torch._C' has no attribute '_cuda_resetPeakMemoryStats' #234

Closed

Apple MPS Support #179

Apple MPS Support #179

Conversation

magnusviri commented Aug 29, 2022

cvakiitho commented Aug 29, 2022 • edited Loading

magnusviri commented Aug 29, 2022

magnusviri commented Aug 29, 2022

lstein commented Aug 29, 2022

lstein commented Aug 29, 2022

Oceanswave commented Aug 29, 2022 • edited Loading

lstein commented Aug 29, 2022

Oceanswave commented Aug 29, 2022

lstein commented Aug 29, 2022

michaelrhanson commented Aug 30, 2022

magnusviri commented Aug 30, 2022

sulkaharo commented Aug 30, 2022

sulkaharo commented Aug 30, 2022

cvakiitho commented Aug 30, 2022 • edited Loading

Vargol commented Aug 30, 2022 • edited Loading

sclausen commented Aug 30, 2022

HenkPoley commented Aug 30, 2022 • edited Loading

bcondron-square commented Aug 31, 2022

beaugunderson commented Aug 31, 2022

danwigrizer commented Aug 31, 2022

mc0 commented Aug 31, 2022

lstein commented Aug 31, 2022

mc0 commented Aug 31, 2022

lstein left a comment

Choose a reason for hiding this comment

lstein commented Aug 31, 2022

lstein commented Aug 31, 2022

sclausen commented Aug 31, 2022

hemmer commented Aug 31, 2022

sclausen commented Aug 31, 2022

cvakiitho commented Aug 31, 2022 • edited Loading

magnusviri commented Aug 31, 2022

cvakiitho commented Aug 31, 2022

magnusviri commented Aug 31, 2022

sulkaharo commented Aug 31, 2022

cvakiitho commented Aug 31, 2022 • edited Loading

magnusviri commented Aug 31, 2022

junukwon7 commented Aug 31, 2022

cgodley commented Aug 31, 2022 • edited Loading

Birch-san commented Aug 31, 2022

cgodley commented Aug 31, 2022 • edited Loading

Birch-san commented Aug 31, 2022 • edited Loading

junukwon7 commented Sep 1, 2022 • edited Loading

cvakiitho commented Aug 29, 2022 •

edited

Loading

Oceanswave commented Aug 29, 2022 •

edited

Loading

cvakiitho commented Aug 30, 2022 •

edited

Loading

Vargol commented Aug 30, 2022 •

edited

Loading

HenkPoley commented Aug 30, 2022 •

edited

Loading

cvakiitho commented Aug 31, 2022 •

edited

Loading

cvakiitho commented Aug 31, 2022 •

edited

Loading

cgodley commented Aug 31, 2022 •

edited

Loading

cgodley commented Aug 31, 2022 •

edited

Loading

Birch-san commented Aug 31, 2022 •

edited

Loading

junukwon7 commented Sep 1, 2022 •

edited

Loading