Terminated due to memory issue on iPhone 14 Pro Max #5

saiedg · 2022-10-14T03:36:11Z

I am running xcode on an intel Mac running macOS 12.6 and trying to install the app on my iPhone 14 Pro Max. After downloading a stable diffusion model checkpoint, downloading maple-diffusion.git, and running the code to convert to fp16 binary blobs, i'm getting this memory terminated error on my iPhone 14 Pro Max running iOS 16.0.3. Any ideas?

madebyollin · 2022-10-14T14:49:23Z

Thanks for testing this! It looks like the Increased Memory Limit capability is missing (the error message says limit=2867MB, which is ~3GB - it should be ~4GB). I uploaded a project with the capability turned on, but I guess it doesn't transfer (meaning, my instructions were missing a step)

Here are the instructions for adding the capability manually: https://developer.apple.com/documentation/xcode/adding-capabilities-to-your-app#Add-a-capability.

The specific capability to add is Increased Memory Limit:
.

Here's what Xcode should show after the Increased Memory Limit capability is added:

After that capability is added, Maple Diffusion should no longer hit any memory limit. If you see any error like Entitlements file "maple_diffusion.entitlements" was modified during the build, run Product > Clean Build Folder and then build it again.

Please let me know if it works - I'm curious to know how fast Maple Diffusion runs on the new phones :)

saiedg · 2022-10-14T16:28:22Z

Thank you for your fast reply. I'm still getting the memory error but now differently. I cleaned the build folder. Restarted xcode. Please let me know if you have any ideas. Looking forward to testing!

madebyollin · 2022-10-15T15:44:26Z

Hmm, your limit is 2867MB, even after adding the "Increased Memory Limit" entitlement 😵‍💫

This is mysterious; either:

The entitlement is displayed but not actually getting applied, or
2867MB is actually the "increased" limit, and the base memory limit is somehow lower on iPhone 14 Pro than on iPhone 13 Pro (???)

2 seems wildly implausible. So I think it still has to be 1; the capability isn't being applied in your case, for some reason.

I see some slight differences in Xcode screenshots that make me worried about differences in the Signing section. Your screenshot shows "Signing (Debug)" and "Signing (Release)" sections separately, but mine doesn't. I'm using Xcode Version 14.0.1 (14A400) and my signing tab looks like this (email redacted):

So, things to check:

Are there differences (or errors) in your signing settings?
Does anything interesting happen if you remove the Increased Memory entitlement and add it again?

saiedg · 2022-10-15T22:06:44Z

I just ran it on my M1 iPad with no issues! So cool. iPadOS 16 is not released yet so I lowered the deloyment target to iPadOS 15.6. Works no problem. About 1.56 steps/sec. So we know it's working. Unfortunately it's looking like the iPhone 14 Pro Max is hitting a memory limit. Any ideas at all to make this work using less memory?

madebyollin · 2022-10-15T22:31:56Z

Cool, great to see it works on iPad!

I don't know of any easy way to get SD to run in < 3GB of memory with MPSGraph, unfortunately - I exhausted all of my tricks getting it below 4GB 😅... but if I can find a way to lower it further I'll definitely update the repo

saiedg · 2022-10-15T22:43:02Z

Interestingly enough, the iPad M1 does not need the 'increase memory' capabilty. Hope this is just a bug with the iPhone 14 Pro Max and that will be fixed with iOS 16.1 or you're able to find one last bit of magic to separate the memory or lower it somehow! You're incredible what you've done!

madebyollin · 2022-10-15T23:05:24Z

Thanks! It looks like the iPad just has a higher base memory limit (5GB instead of 3GB). If you are able to get the "increase memory" entitlement working on iPad, you may even be able to turn off the saveMemoryButBeSlower option in ContentView.swift to get faster performance... but since generation already seems pretty fast, maybe don't risk it 😆

saiedg · 2022-10-15T23:13:46Z

Just tested on the iPad M1 with saveMemoryButBeSlower as false. I had to turn on increase memory... Peaked at 1.83/s! Pretty good performance increase. I am here anytime you want to test on the iPhone 14 Pro Max with any ideas you have!

madebyollin · 2022-10-15T23:28:48Z

Gotcha! Though it looks like the performance is actually not better with the flag changed (the progress bar is confusingly printing seconds / step, not steps / second - lower is better!)... maybe leave the saveMemoryButBeSlower on for now 😆

(FWIW, repeated generations can get slower and slower if the GPU just starts getting too hot - it's possible that the saveMemoryButBeSlower option would still be faster from a cold start)

I'll be sure to let you know if I have ideas for getting this working on the 14 Pro Max - thanks again for your help testing this out!

saiedg · 2022-10-15T23:32:39Z

Hope to hear from you soon!

simonerlic · 2022-10-17T06:11:19Z

Hey! Thought I would chime in and confirm that I'm also running into the same issue (the RESOURCE_TYPE_MEMORY (limit=2867 MB...) error) with the iPhone 14 Pro running iOS 16.0.3, building on an Apple Silicon Macbook Pro. I'm more than happy to help test any troubleshooting ideas if we come up with anything!

Slightly curious, I gave running os_proc_available_memory a go to see how much memory we had to work with, and it returned 2989554560 (2989 MB?) From what I can tell, this more-or-less confirms that Increased Memory Limit isn't working with either the iPhone 14 Pro or iOS 16.0.3.

Anyone have any suggestions?

saiedg · 2022-10-17T18:05:16Z

Hi! This guy on twitter has also gotten Stable Diffusion on iOS working but it's slower than yours. He says he got most of "app running on the neural engine." Unfortunately he does not detail how. I hope maybe that will help spring up an idea for you!
https://twitter.com/wattmaller1/status/1582047120327991296

liuliu · 2022-10-17T21:26:17Z

I don't know of any easy way to get SD to run in < 3GB of memory with MPSGraph, unfortunately - I exhausted all of my tricks getting it below 4GB 😅... but if I can find a way to lower it further I'll definitely update the repo

There is a blog post about transformer optimizations Apple applied: https://machinelearning.apple.com/research/neural-engine-transformers These are mostly about speed, but it also shows a way to reduce intermediate tensor usage by using explicit multi-head attention. At FP16, the q * k^{T} result can use up to 500MiB and splitting into 8 would reduce that peak memory usage. It is something you probably want to try.

(This optimization is pretty low on my list, since I am looking at a more broader optimization much like xformer + bitsandbytes for the multihead attention).

madebyollin · 2022-10-18T02:49:01Z

@saiedg Matt is using CoreML (see this other thread) - his CoreML-based implementation seems to be moderately slower, but able to run on the neural engine, and more amenable to swapping parts of the UNet out to storage without paying a huge recompilation cost (so he can run a UNet step in under 3GB and ~5 seconds wall clock).

MPSGraph recompilation was unusably slow when I tried swapping portions of the UNet to storage iirc, and the level1 optimization flag (which seems to unlock the neural engine) gave me segfaults 🤷

Anyway, possible solutions would be:

Find some way to get the 4GB limit unlocked on the iPhone 14s
Find some tricks to make this MPSGraph version use <3GB without being substantially slower
Re-implement the UNet with some non-MPSGraph API so it uses <3GB without being substantially slower. Possible APIs:
3.1 CoreML
3.2 MPS + Metal

...but none of those seem easy 😅

@liuliu Yup! I believe I already implemented the split-across-heads-to-save-memory trick (though my implementation might have bugs). The other big missing optimization I'm aware of is Flash Attention, but I don't see any easy way to bring that to MPSGraph.

liuliu · 2022-10-18T15:51:42Z

@liuliu Yup! I believe I already implemented the split-across-heads-to-save-memory trick (though my implementation might have bugs). The other big missing optimization I'm aware of is Flash Attention, but I don't see any easy way to bring that to MPSGraph.

Yeah, I don't know how to print memory allocation graph from MPSGraph to know what's going on there, otherwise we can dig to see where the extra 3+GiB memory from (the model itself (unet) in fp16 is about 1.65G)

ParityError · 2022-10-19T03:03:56Z

Maybe try the following boolean in addition to com.apple.developer.kernel.increased-memory-limit entitlement:

com.apple.developer.kernel.extended-virtual-addressing

You need to enable "Extended Virtual Address Space" manually in the App ID configuration in https://developer.apple.com/account/resources/identifiers/.

simonerlic · 2022-10-19T15:59:09Z

Maybe try the following boolean in addition to com.apple.developer.kernel.increased-memory-limit entitlement

I believe that @saiedg had that in their entitlements and still ran into the same issue (or at least that is what I've gathered from screenshots.) I'll give it a shot myself tonight though, since adding more virtual address space shouldn't hurt. I'll let you know how it goes!

saiedg · 2022-10-19T17:09:25Z

This is what I tested with. I will test again on monday with iOS 16.1 and an updated xcode.

liuliu · 2022-10-20T20:31:12Z

Anyway, possible solutions would be:

1. Find some way to get the 4GB limit unlocked on the iPhone 14s

2. Find some tricks to make this MPSGraph version use <3GB without being substantially slower

3. Re-implement the UNet with some non-MPSGraph API so it uses <3GB without being substantially slower. Possible APIs:
   3.1 CoreML
   3.2 MPS + Metal

...but none of those seem easy 😅

Just give you some updates on my end, I switched softmax from MPSGraph to MPSMatrixSoftMax and some GEMM from MPSGraph to MPSMatrixMultiplication. This helps because in MPSGraph, it doesn't do inplace softmax (0.5G) and it seems when I copy data out of MPSGraph, there are extra scratch space for GEMM (another 0.5G for the dot product of q, k). Combining these two, I was able to run the model around 2GiB without perf penalty (thus, 1.6 it / s on M1 and ~2 it / s on iPhone 4 Pro).

saiedg · 2022-10-20T20:33:32Z

@liuliu that's great! Well done! Can you upload it??

liuliu · 2022-10-20T21:32:21Z

Hi, these are not done with maple-diffusion but against my own implementation, which is meaningfully different to make similar changes in maple-diffusion difficult. (maple-diffusion uses MPSGraph as a complete solution and generate the full graph while I use MPSGraph more like how PyTorch does it, as individual op). The comment here is more as a potential direction for @madebyollin to see whether some of the learnings can be applicable here.

simonerlic · 2022-11-05T05:13:24Z

Update: Looks like it's working now on iOS 16.1 stable!

I think that once someone else can confirm this we can close this issue!

hubin858130 · 2022-11-23T03:50:06Z

Great. I upgraded my iPhone 14 pro from 16.0.2 to 16.1.1. It can run without prompting memory errors。

HelixNGC7293 · 2022-11-27T07:47:02Z

I can confirm that it's fixed in 16.1. I got a user has the exactly same issue on 16.0 iPhone Pro 14 but solved after upgraded to 16.1!

simonerlic · 2022-11-28T19:56:01Z

Perfect, thanks for confirming @HelixNGC7293 and @hubin858130!

@madebyollin I think this case is more or less resolved, seeing as an iOS update solved it.

madebyollin · 2022-11-29T02:10:52Z

Cool - thanks to everyone for testing and verifying this (and to whoever at 🍎 fixed the low limit)! I'll mark it closed, I guess :)

madebyollin closed this as completed Nov 29, 2022

crazypoo mentioned this issue Apr 8, 2023

Hola,amigo,iPad pro 11Gen2 crash on os 16.5B1 #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminated due to memory issue on iPhone 14 Pro Max #5

Terminated due to memory issue on iPhone 14 Pro Max #5

saiedg commented Oct 14, 2022

madebyollin commented Oct 14, 2022 •

edited

Loading

saiedg commented Oct 14, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

simonerlic commented Oct 17, 2022 •

edited

Loading

saiedg commented Oct 17, 2022

liuliu commented Oct 17, 2022

madebyollin commented Oct 18, 2022

liuliu commented Oct 18, 2022

ParityError commented Oct 19, 2022 •

edited

Loading

simonerlic commented Oct 19, 2022

saiedg commented Oct 19, 2022

liuliu commented Oct 20, 2022

saiedg commented Oct 20, 2022

liuliu commented Oct 20, 2022

simonerlic commented Nov 5, 2022

hubin858130 commented Nov 23, 2022

HelixNGC7293 commented Nov 27, 2022 •

edited

Loading

simonerlic commented Nov 28, 2022

madebyollin commented Nov 29, 2022

Terminated due to memory issue on iPhone 14 Pro Max #5

Terminated due to memory issue on iPhone 14 Pro Max #5

Comments

saiedg commented Oct 14, 2022

madebyollin commented Oct 14, 2022 • edited Loading

saiedg commented Oct 14, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

madebyollin commented Oct 15, 2022

saiedg commented Oct 15, 2022

simonerlic commented Oct 17, 2022 • edited Loading

saiedg commented Oct 17, 2022

liuliu commented Oct 17, 2022

madebyollin commented Oct 18, 2022

liuliu commented Oct 18, 2022

ParityError commented Oct 19, 2022 • edited Loading

simonerlic commented Oct 19, 2022

saiedg commented Oct 19, 2022

liuliu commented Oct 20, 2022

saiedg commented Oct 20, 2022

liuliu commented Oct 20, 2022

simonerlic commented Nov 5, 2022

hubin858130 commented Nov 23, 2022

HelixNGC7293 commented Nov 27, 2022 • edited Loading

simonerlic commented Nov 28, 2022

madebyollin commented Nov 29, 2022

madebyollin commented Oct 14, 2022 •

edited

Loading

simonerlic commented Oct 17, 2022 •

edited

Loading

ParityError commented Oct 19, 2022 •

edited

Loading

HelixNGC7293 commented Nov 27, 2022 •

edited

Loading