Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple MPS Support #179

Closed
wants to merge 7 commits into from
Closed

Conversation

magnusviri
Copy link
Contributor

These are all of the changes from my repo that gets this working on Apple Mac's with MPS (Apple Silicon and Intel). I am aware of at least one problem with the scripts that are new in lstein, but I'm hoping I can get those fixed in time. The main scripts old original txt2img and img2img work.

@cvakiitho
Copy link

cvakiitho commented Aug 29, 2022

I can confirm txt2img works on my m1, for some reason pip fails installing from requirements, with No matching distribution found for opencv-python==4.1.2.30 , but works anyway.

dream.py fails with cuda calls, as expected.

Anyway, thanks.

@magnusviri
Copy link
Contributor Author

If merging to main is too disruptive, could you make a branch for apple-mps-support? I can resubmit the pull request to that branch.

Someone said this works on Intel using MPS, that's why I dropped "silicon" from the other branch name. It's basically Apple's MPS support, not Apple Silicon support.

@magnusviri
Copy link
Contributor Author

I'll work on dream.py maybe later today, like when I'm done with work in 8 hours or something. Probably later so I can eat lol.

@lstein
Copy link
Collaborator

lstein commented Aug 29, 2022

I do not have a MPS system to test on. I'm reviewing the code now, so could I get confirmation from someone (other than the pull author) that it is working on the appropriate Apple silicon? I will do some timing tests to confirm that there is no performance hit on CUDA systems.

@lstein
Copy link
Collaborator

lstein commented Aug 29, 2022

The PR merges cleanly and does not have a measurable effect on either model loading time or image generation time when running on CUDA Linux. I did five timings and could not detect a statistically significant difference. If there is an effect it is no more than a fraction of a second.

@Oceanswave
Copy link
Contributor

Oceanswave commented Aug 29, 2022

Testing now

./scripts/orig_scripts/txt2img.py works beautifully, nice!

dream.py still needs some work. OP indicated that he's going to work on it. So we probably should either update the README.md to make that just a little more obvious to eliminate confusion or wait until OP gets a chance.

@lstein
Copy link
Collaborator

lstein commented Aug 29, 2022

dream.py still needs some work. OP indicated that he's going to work on it. So we probably should either update the README.md to make that just a little more obvious to eliminate confusion or wait until OP gets a chance.

Sorry, don't recognize the "OP" acronym. Who is this? I would much prefer to wait until the whole package (dream.py, webUI, and the method calls) all work on MPS before rolling this out. There will be jubilation in the streets when SD comes to the M1 chips.

Up above, @Birch-san requested that we turn this into a public feature branch, and I'm happy to do so as a marker for what is coming. Any objections?

@Oceanswave
Copy link
Contributor

Sorry, don't recognize the "OP" acronym. Who is this? I would much prefer to wait until the whole package (dream.py, webUI, and the method calls) all work on MPS before rolling this out. There will be jubilation in the streets when SD comes to the M1 chips.

Apologies, meaning magnusviri

Up above, @Birch-san requested that we turn this into a public feature branch, and I'm happy to do so as a marker for what is coming. Any objections?

sounds like a plan

@lstein
Copy link
Collaborator

lstein commented Aug 29, 2022

New public branch magnusviri-apple-mps-support is now available for people to work on.

@michaelrhanson
Copy link

Just a data point: starting with commit 8b9f2c7, was able to compile and run txt2img on a MB Pro (M1 chip, 16 GB) running 12.5.1. From a fairly stock configuration with XCode tools installed, had to install Rust and Anaconda, update pip, tweak the PYTORCH_ENABLE_MPS_FALLBACK env variable, and symlink openai/clip-vit-large-patch14. With default settings txt2img.py completed 2 iterations in 43:33.

@magnusviri
Copy link
Contributor Author

I was actually hoping for a feature branch. However, this has now exceeded my GitHub experience. I've never worked on a group project like this before. I'm actually not sure how to proceed at all now. Now that the lstein branch is the upstream branch I want to work on getting things like dream.py working, but actually first I want to get other Mac users to switch from the main repo to this repo so that hopefully all Mac users are working on the same thing.

@sulkaharo
Copy link

Regarding ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckp - where's the package to load these models?

@sulkaharo
Copy link

Right, found the model. Dream.py now throws the following when run:

Traceback (most recent call last):
  File "/Users/sulka/dev/stable-diffusion/scripts/dream.py", line 488, in <module>
    main()
  File "/Users/sulka/dev/stable-diffusion/scripts/dream.py", line 84, in main
    t2i.load_model()
  File "/Users/sulka/dev/stable-diffusion/ldm/simplet2i.py", line 548, in load_model
    model = self._load_model_from_config(config, self.weights)
  File "/Users/sulka/dev/stable-diffusion/ldm/simplet2i.py", line 600, in _load_model_from_config
    model = instantiate_from_config(config.model)
  File "/Users/sulka/dev/stable-diffusion/ldm/util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config['target'])(
  File "/Users/sulka/dev/stable-diffusion/ldm/models/diffusion/ddpm.py", line 657, in __init__
    self.instantiate_cond_stage(cond_stage_config)
  File "/Users/sulka/dev/stable-diffusion/ldm/models/diffusion/ddpm.py", line 768, in instantiate_cond_stage
    model = instantiate_from_config(config)
  File "/Users/sulka/dev/stable-diffusion/ldm/util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config['target'])(
  File "/Users/sulka/dev/stable-diffusion/ldm/modules/encoders/modules.py", line 253, in __init__
    self.tokenizer = CLIPTokenizer.from_pretrained(
  File "/Users/sulka/opt/anaconda3/envs/ldm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1768, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'openai/clip-vit-large-patch14' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

@cvakiitho
Copy link

cvakiitho commented Aug 30, 2022

Just a data point: starting with commit 8b9f2c7, was able to compile and run txt2img on a MB Pro (M1 chip, 16 GB) running 12.5.1. From a fairly stock configuration with XCode tools installed, had to install Rust and Anaconda, update pip, tweak the PYTORCH_ENABLE_MPS_FALLBACK env variable, and symlink openai/clip-vit-large-patch14. With default settings txt2img.py completed 2 iterations in 43:33.

As in 43 minutes? On my m1 max txt2image with --n_iter 2 takes ~ 3 minutes

Right, found the model. Dream.py now throws the following when run:

for me I had to use the sd-v1-4.ckpt directly, and just directly copy it to models/ldm/stable-diffusion-v1/model.ckp

@Vargol
Copy link
Contributor

Vargol commented Aug 30, 2022

Can some kind people one check if the seed issue is still occurring, that not using fixed_code but using a seed
the images are consistant between runs, but are not consistent between users.

[EDIT: Actually I see this repo doesn't have the fix for seed in, just the fix for fixed_code,
see https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1230075917
basically its the same fix as the fixed_code change use the CPU randn calls for MPS
in the diffusion scripts.]

Not sure if this is a MPS or general thing, but only MPS users have complained so far.

There's a work around to move the seed_everything after the model load,
see CompVis/stable-diffusion#25 (comment)
and the follow on for that message.

I'm jut a DBA with no pytorch skills so I've no idea why the model load would affect the random number generator or why the affect would be different for different users.

I assume hoping for randn to come up with the same order of random values on different architectures would be a bit of a pipe dream ?

@sclausen
Copy link

I get an error with the tokenizer when trying to run python scripts/dream.py. I don't know why the tokenizer isn't compiled for the right architecture 🤷‍♂️ has anyone encountered such an error or the expertise to help me out what I may have gotten wrong?

python scripts/dream.py
* Initializing, be patient...

Traceback (most recent call last):
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 26, in <module>
    from .tokenization_utils_base import (
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 73, in <module>
    from tokenizers import AddedToken
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/__init__.py", line 79, in <module>
    from .tokenizers import (
ImportError: dlopen(/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/usr/local/opt/openssl/lib/tokenizers.cpython-310-darwin.so' (no such file), '/tokenizers.cpython-310-darwin.so' (no such file), '/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/models/__init__.py", line 19, in <module>
    from . import (
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/models/layoutlm/__init__.py", line 28, in <module>
    from .configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/models/layoutlm/configuration_layoutlm.py", line 19, in <module>
    from transformers import PretrainedConfig, PreTrainedTokenizer, TensorType
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 874, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.tokenization_utils because of the following error (look up to see its traceback):
dlopen(/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/usr/local/opt/openssl/lib/tokenizers.cpython-310-darwin.so' (no such file), '/tokenizers.cpython-310-darwin.so' (no such file), '/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/username/git/lstein/stable-diffusion/scripts/dream.py", line 469, in <module>
    main()
  File "/Users/username/git/lstein/stable-diffusion/scripts/dream.py", line 34, in main
    from pytorch_lightning import logging
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
    from torchmetrics import Accuracy as _Accuracy
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/torchmetrics/functional/__init__.py", line 68, in <module>
    from torchmetrics.functional.text.bert import bert_score
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/torchmetrics/functional/text/bert.py", line 28, in <module>
    from transformers import AutoModel, AutoTokenizer
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 874, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback):
Failed to import transformers.tokenization_utils because of the following error (look up to see its traceback):
dlopen(/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so, 0x0002): tried: '/usr/local/opt/openssl/lib/tokenizers.cpython-310-darwin.so' (no such file), '/tokenizers.cpython-310-darwin.so' (no such file), '/Users/username/miniforge3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

@HenkPoley
Copy link

HenkPoley commented Aug 30, 2022

See this error on M1 16GB, when running a prompt under python ./scripts/dream.py (Edit: might be expected: #179 (comment)):

>> Setting Sampler to k_heun
Traceback (most recent call last):
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/./scripts/dream.py", line 488, in <module>
    main()
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/./scripts/dream.py", line 98, in main
    main_loop(t2i, opt.outdir, cmd_parser, infile)
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/./scripts/dream.py", line 184, in main_loop
    image_list  = t2i.prompt2image(image_callback=callback, **vars(opt))
  File "/Users/henk/stab_diff/stable-diffusion-lstein-magnusviri-apple-mps-support/ldm/simplet2i.py", line 287, in prompt2image
    torch.cuda.torch.cuda.reset_peak_memory_stats()
  File "/Users/henk/miniconda3/envs/ldm-2/lib/python3.10/site-packages/torch/cuda/memory.py", line 256, in reset_peak_memory_stats
    return torch._C._cuda_resetPeakMemoryStats(device)
AttributeError: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'

This error has nothing to do with the 'k_heun' sampler.

@bcondron-square
Copy link

@lstein I believe the new module (ldm.dream.devices) seems to be missing from the repo presently

@beaugunderson
Copy link

I do not have a MPS system to test on. I'm reviewing the code now, so could I get confirmation from someone (other than the pull author) that it is working on the appropriate Apple silicon?

With the addition of my suggested fix I was able to reproduce this example from magnusviri's original fork:

python scripts/orig_scripts/txt2img.py \
  --prompt "Anubis riding a motorbike in Grand Theft Auto cover, palm trees, cover art by Stephen Bliss, artstation, high quality" \
  --ddim_steps=50 \
  --n_samples=1 \
  --n_rows=1 \
  --n_iter=1 \
  --seed 1805504473 \
  --fixed_code

grid-0007

@danwigrizer
Copy link

Wow. Lots of code changes! I couldn't stand all the repeated code for choosing the torch device so I folded everything into a shared function called choose_torch_device() and stuck it into a small module named ldm.dream.devices.

@lstein I don't believe the ldm.dream.devices module is available. Doesn't appear this module was made. Correct me if I'm wrong.

@mc0
Copy link

mc0 commented Aug 31, 2022

It looks like that file (ldm/dream/devices.py) should be:

import torch

def choose_torch_device():
    if torch.cuda.is_available():
        return 'cuda'
    elif torch.backends.mps.is_available():
        return 'mps'
    return 'cpu'

@lstein
Copy link
Collaborator

lstein commented Aug 31, 2022

Wow. Lots of code changes! I couldn't stand all the repeated code for choosing the torch device so I folded everything into a shared function called choose_torch_device() and stuck it into a small module named ldm.dream.devices.

@lstein I don't believe the ldm.dream.devices module is available. Doesn't appear this module was made. Correct me if I'm wrong.

My bad. Forgot to push.

Make --fixed-code work again

Co-authored-by: Beau Gunderson <[email protected]>
@mc0
Copy link

mc0 commented Aug 31, 2022

I'm having issues with this branch working (where magnusviri's fork was working).

/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

My bad. Forgot to push.

Maybe it just didn't get added with git add?

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked that dream.py works with both txt2img and img2img functionality, and that images produced by previous versions are the same now when using the original seed. Not surprisingly, images produced on macs are different, even when same seed is used.

@lstein
Copy link
Collaborator

lstein commented Aug 31, 2022

@Birch-san I tried replacing my fork of k-diffusion with yours, and unfortunately the result was that dream.py crashed with:

Traceback (most recent call last):
  File "scripts/dream.py", line 488, in <module>
    main()
  File "scripts/dream.py", line 35, in main
    from ldm.simplet2i import T2I
  File "/u/lstein/projects/SD/stable-diffusion/ldm/simplet2i.py", line 28, in <module>
    from ldm.models.diffusion.ksampler import KSampler
  File "/u/lstein/projects/SD/stable-diffusion/ldm/models/diffusion/ksampler.py", line 2, in <module>
    import k_diffusion as K
  File "/u/lstein/projects/SD/stable-diffusion/src/k-diffusion/k_diffusion/__init__.py", line 1, in <module>
    from . import augmentation, config, evaluation, external, gns, layers, models, sampling, utils
  File "/u/lstein/projects/SD/stable-diffusion/src/k-diffusion/k_diffusion/external.py", line 6, in <module>
    from . import sampling, utils
  File "/u/lstein/projects/SD/stable-diffusion/src/k-diffusion/k_diffusion/sampling.py", line 10, in <module>
    from typing import Optional, Callable, TypeAlias
ImportError: cannot import name 'TypeAlias' from 'typing' (/u/lstein/.conda/envs/ldm/lib/python3.8/typing.py)

What version of typing is required?

@lstein
Copy link
Collaborator

lstein commented Aug 31, 2022

I did a squash merge from the magnusviri-apple-mps-support branch. Everything went in cleanly. I will leave the branch up so that the commit chain can be followed.

This was a bear! Thank you everyone for the thorough conversations. Let me know if any assistance is required.

@lstein lstein closed this Aug 31, 2022
@sclausen
Copy link

@hemmer Thanks for the suggestion, but I can't install the osx-arm64 version, since it requires python <3.10.0a0.

$ conda install -c conda-forge tokenizers
[…]
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - tokenizers -> python[version='3.8.*|3.9.*']
  - tokenizers -> python[version='>=3.8,<3.9.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0',build=*_cpython]

Your python: python=3.10

[…]

@hemmer
Copy link

hemmer commented Aug 31, 2022

@sclausen I forgot to mention that I also downgraded to python 3.9 without issue ( in the yml)

@sclausen
Copy link

@hemmer I didn't tried that 🤦‍♂️ Thank you very much! What I ask myself: Why does it seem that I'm the only one (or one of few) that had this issue? Shouldn't this be a problem for every M1 user?

@cvakiitho
Copy link

cvakiitho commented Aug 31, 2022

btw there is another fork, running with GPU on m1, making it run a lot faster for me:
https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/

Actually maybe it's the export PYTORCH_ENABLE_MPS_FALLBACK=1 that I had that made it run slower?
... I need to do some more testing

EDIT2:

Yep it's the PYTORCH_ENABLE_MPS_FALLBACK=1 - if you had to turn it on, turn it of and run:
conda install pytorch -c pytorch-nightly

@magnusviri
Copy link
Contributor Author

You guys moved so fast my head is still spinning. I'm trying to figure out everything that's just happened. I just replied here that we need to make one change regarding the mps support. I had made this on the repo on my hard drive but I haven't had time to push it. I'm not sure what the state of the code is anymore. I've never worked on a project where multiple people contribute, so I'm having a hard time understanding what all of the GitHub messages mean. I can't even keep up with all the messages!

@cvakiitho
Copy link

@hemmer I didn't tried that 🤦‍♂️ Thank you very much! What I ask myself: Why does it seem that I'm the only one (or one of few) that had this issue? Shouldn't this be a problem for every M1 user?

I believe everyone is using txt2img, and img2img, as it's the only currently supported on m1, as it's written in docs in this PR

@magnusviri
Copy link
Contributor Author

This works! I've only tested txt2img.py so far.

One problem, it's really slow at 18.24s/it. Before I was getting somewhere between 4-8s/it (it kept changing as we made changes). Even with PYTORCH_ENABLE_MPS_FALLBACK=1 before it wasn't this slow, it was closer to 8-9s/it. I will test more tomorrow.

@sulkaharo
Copy link

@magnusviri I'm running a clean installation of the main branch from this repo using a MacBook Pro M1 Max with 24 GPU cores and getting an average of 2.23 s/it

@cvakiitho
Copy link

cvakiitho commented Aug 31, 2022

I was under 1.5 s/it with same M1 Max on the forked branch. Let me try fresh main and report back.

Edit:
Yes, I have the same speed on main branch.

@magnusviri
Copy link
Contributor Author

Ugh! I forgot to set n_samples to 1! Yeah, on my M1 MacBook Air I'm now getting 4.39s/it. This might even be faster than before. False alarm!

@junukwon7
Copy link

I got the same error. I was able to make it stop using the diff (below) and the following command:

python3 scripts/dream.py --full_precision

However, all of my outputs are blank (black squares).

diff --git a/ldm/simplet2i.py b/ldm/simplet2i.py
index 7ec2757..024636d 100644
--- a/ldm/simplet2i.py
+++ b/ldm/simplet2i.py
@@ -284,7 +284,7 @@ class T2I:
             self._set_sampler()
 
         tic = time.time()
-        torch.cuda.torch.cuda.reset_peak_memory_stats()
+        # torch.cuda.torch.cuda.reset_peak_memory_stats()
         results = list()
 
         try:
@@ -316,7 +316,7 @@ class T2I:
                     callback=step_callback,
                 )
 
-            with scope(self.device.type), self.model.ema_scope():
+            with nullcontext(self.device.type), self.model.ema_scope():
                 for n in trange(iterations, desc='Generating'):
                     seed_everything(seed)
                     iter_images = next(images_iterator)
@@ -367,6 +367,8 @@ class T2I:
                 'Partial results will be returned; if --grid was requested, nothing will be returned.'
             )
         except RuntimeError as e:
+            import traceback
+            traceback.print_exc()
             print(str(e))
             print('Are you sure your system has an adequate NVIDIA GPU?')
 
@@ -457,7 +459,7 @@ class T2I:
 
         init_image = self._load_img(init_img).to(self.device)
         init_image = repeat(init_image, '1 ... -> b ...', b=batch_size)
-        with precision_scope(self.device.type):
+        with nullcontext(self.device.type):
             init_latent = self.model.get_first_stage_encoding(
                 self.model.encode_first_stage(init_image)
             )  # move to latent space

@cgodley Thanks for your work. It works for me, however returns blank as you mentioned.

Did anyone figure out how to make deam.py functional in m.ps? Seems like the conversations are about both original and dream.py.

@cgodley
Copy link
Contributor

cgodley commented Aug 31, 2022

@junukwon7 Yes python3 scripts/dream.py --full_precision worked for me with my patch (above) but you can't use all samplers yet.

  • k_lms gives me a blank square
  • ddim worked for me
  • Not sure about the other samplers

I think the reason for k-diffusion producing black squares is this issue:
pytorch/pytorch#84364

Edit: I was using https://github.com/magnusviri/stable-diffusion-lstein @ 84c10346fb777c827df58b264a726572225a45c6

@Birch-san
Copy link

@cgodley yes, I have a workaround to fix the k-diffusion samplers on MPS in my k-diffusion branch.

I recommended it as a change on this review a few days ago but it wasn't incorporated.

#179 (comment)

@cgodley
Copy link
Contributor

cgodley commented Aug 31, 2022

@Birch-san I tried your k-diffusion fork but I'm getting very blurry images like this:

python scripts/orig_scripts/txt2img.py --n_samples 1 --n_iter 1 --skip_grid --prompt "an atronaut riding a horse" --seed 42 --scale 7.5 --n_iter 1 --ddim_steps 50 --outdir /Users/Shared/txt2img --klms
image

Same thing using DDIM:
python scripts/orig_scripts/txt2img.py --n_samples 1 --n_iter 1 --skip_grid --prompt "an atronaut riding a horse" --seed 42 --scale 7.5 --n_iter 1 --ddim_steps 50 --outdir /Users/Shared/txt2img
image

I'm using lstein/stable-diffusion@fix-crash-on-mps (fd2200e0a8ec4d628da43c4dd79e1ae73f01b14a) and I've checked out Birch-san/k-diffusion@mps (6e5c8a77edc62e75414ad850cb0a6f7ddceea0d4) in the src/k-diffusion directory.

Edit: I'm using MacBook Pro M1 14" 16GB, macOS 12.5

@Birch-san
Copy link

Birch-san commented Aug 31, 2022

@cgodley thanks for trying it out… hm, I haven't tried it in the context of the lstein fork, but certainly when using the txt2img_fork.py from my own stable-diffusion branch: 50 steps of k_lms sampler using the default noise schedule works just fine:

00494_an astronaut riding a horse_k_lms50

I'm using the same commit 6e5c8a7 from my k-diffusion fork that you mentioned.

@junukwon7
Copy link

junukwon7 commented Sep 1, 2022

@cgodley Thanks. #256 and #268 works well for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.