-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminated due to memory issue on iPhone 14 Pro Max #5
Comments
Thanks for testing this! It looks like the Here are the instructions for adding the capability manually: https://developer.apple.com/documentation/xcode/adding-capabilities-to-your-app#Add-a-capability. The specific capability to add is Here's what Xcode should show after the After that capability is added, Maple Diffusion should no longer hit any memory limit. If you see any error like Please let me know if it works - I'm curious to know how fast Maple Diffusion runs on the new phones :) |
I just ran it on my M1 iPad with no issues! So cool. iPadOS 16 is not released yet so I lowered the deloyment target to iPadOS 15.6. Works no problem. About 1.56 steps/sec. So we know it's working. Unfortunately it's looking like the iPhone 14 Pro Max is hitting a memory limit. Any ideas at all to make this work using less memory? |
Cool, great to see it works on iPad! I don't know of any easy way to get SD to run in < 3GB of memory with MPSGraph, unfortunately - I exhausted all of my tricks getting it below 4GB 😅... but if I can find a way to lower it further I'll definitely update the repo |
Interestingly enough, the iPad M1 does not need the 'increase memory' capabilty. Hope this is just a bug with the iPhone 14 Pro Max and that will be fixed with iOS 16.1 or you're able to find one last bit of magic to separate the memory or lower it somehow! You're incredible what you've done! |
Thanks! It looks like the iPad just has a higher base memory limit (5GB instead of 3GB). If you are able to get the "increase memory" entitlement working on iPad, you may even be able to turn off the |
Gotcha! Though it looks like the performance is actually not better with the flag changed (the progress bar is confusingly printing seconds / step, not steps / second - lower is better!)... maybe leave the (FWIW, repeated generations can get slower and slower if the GPU just starts getting too hot - it's possible that the I'll be sure to let you know if I have ideas for getting this working on the 14 Pro Max - thanks again for your help testing this out! |
Hope to hear from you soon! |
Hey! Thought I would chime in and confirm that I'm also running into the same issue (the Slightly curious, I gave running Anyone have any suggestions? |
Hi! This guy on twitter has also gotten Stable Diffusion on iOS working but it's slower than yours. He says he got most of "app running on the neural engine." Unfortunately he does not detail how. I hope maybe that will help spring up an idea for you! |
There is a blog post about transformer optimizations Apple applied: https://machinelearning.apple.com/research/neural-engine-transformers These are mostly about speed, but it also shows a way to reduce intermediate tensor usage by using explicit multi-head attention. At FP16, the (This optimization is pretty low on my list, since I am looking at a more broader optimization much like xformer + bitsandbytes for the multihead attention). |
@saiedg Matt is using CoreML (see this other thread) - his CoreML-based implementation seems to be moderately slower, but able to run on the neural engine, and more amenable to swapping parts of the UNet out to storage without paying a huge recompilation cost (so he can run a UNet step in under 3GB and ~5 seconds wall clock). MPSGraph recompilation was unusably slow when I tried swapping portions of the UNet to storage iirc, and the Anyway, possible solutions would be:
...but none of those seem easy 😅 @liuliu Yup! I believe I already implemented the split-across-heads-to-save-memory trick (though my implementation might have bugs). The other big missing optimization I'm aware of is Flash Attention, but I don't see any easy way to bring that to MPSGraph. |
Yeah, I don't know how to print memory allocation graph from MPSGraph to know what's going on there, otherwise we can dig to see where the extra 3+GiB memory from (the model itself (unet) in fp16 is about 1.65G) |
Maybe try the following boolean in addition to
You need to enable "Extended Virtual Address Space" manually in the App ID configuration in https://developer.apple.com/account/resources/identifiers/. |
I believe that @saiedg had that in their entitlements and still ran into the same issue (or at least that is what I've gathered from screenshots.) I'll give it a shot myself tonight though, since adding more virtual address space shouldn't hurt. I'll let you know how it goes! |
Just give you some updates on my end, I switched softmax from MPSGraph to MPSMatrixSoftMax and some GEMM from MPSGraph to MPSMatrixMultiplication. This helps because in MPSGraph, it doesn't do inplace softmax (0.5G) and it seems when I copy data out of MPSGraph, there are extra scratch space for GEMM (another 0.5G for the dot product of q, k). Combining these two, I was able to run the model around 2GiB without perf penalty (thus, 1.6 it / s on M1 and ~2 it / s on iPhone 4 Pro). |
@liuliu that's great! Well done! Can you upload it?? |
Hi, these are not done with maple-diffusion but against my own implementation, which is meaningfully different to make similar changes in maple-diffusion difficult. (maple-diffusion uses MPSGraph as a complete solution and generate the full graph while I use MPSGraph more like how PyTorch does it, as individual op). The comment here is more as a potential direction for @madebyollin to see whether some of the learnings can be applicable here. |
Update: Looks like it's working now on iOS 16.1 stable! I think that once someone else can confirm this we can close this issue! |
I can confirm that it's fixed in 16.1. I got a user has the exactly same issue on 16.0 iPhone Pro 14 but solved after upgraded to 16.1! |
Perfect, thanks for confirming @HelixNGC7293 and @hubin858130! @madebyollin I think this case is more or less resolved, seeing as an iOS update solved it. |
Cool - thanks to everyone for testing and verifying this (and to whoever at 🍎 fixed the low limit)! I'll mark it closed, I guess :) |
I am running xcode on an intel Mac running macOS 12.6 and trying to install the app on my iPhone 14 Pro Max. After downloading a stable diffusion model checkpoint, downloading maple-diffusion.git, and running the code to convert to fp16 binary blobs, i'm getting this memory terminated error on my iPhone 14 Pro Max running iOS 16.0.3. Any ideas?
The text was updated successfully, but these errors were encountered: