Containers for Apple Silicon Macs work with GPU-accelerated Vulkan #8042
Replies: 2 comments 3 replies
-
Update: Standard #4167 benchmark results for M2 Max 38 GPU 96 GB RAM MacOs with Fedora Container in Podman 4.9 - 8 CPUs + 32Gb allocated to VM. This DOES support GPU-acceleration via the Vulkan driver. Faster as pure virtualization (8 CPU, see below), but much slower than pure MacOS (see #4167). Not sure what caused the extremely slow Vulkan F16 TG result. PP is supposed to be largely dependent on compute performance, TG largely on memory-bandwidth (with a bit of compute for quantization).. Vulkan0: Virtio-GPU Venus (Apple M2 Max) | uma: 1 | fp16: 1 | warp size: 32
Compared with M2 Max 38 GPU 96 GB RAM MacOs with Ubuntu 24.04 in Parallels 19.4.0 - 8 CPUs + 32Gb allocated to VM (pure CPU execution).
|
Beta Was this translation helpful? Give feedback.
-
Interersting, I just read your article but I didn't try the setup myself yet. Do you have some updates after two weeks? Looking for a containerized Ollama solution myself that could take advantage of Apple Silicon's GPUs. |
Beta Was this translation helpful? Give feedback.
-
I just came across a very interesting posting by Sergio López - how to enable GPU-acceleration for MacOS Apple Silicon containers https://sinrega.org/2024-03-06-enabling-containers-gpu-macos/, and was able to reproduce the acceleration results. Basically its routing Vulkan API calls out of the containers to a Vulkan-to-Metal layer in the host via the virtual machine monitor.
With Phi-3 on my M2 Max, I got approx. ~78 token/s token-generation (TG) natively (-ngl 99), ~63 in a container (-ngl 99), ~34 native (-ngl 0), ~20 in a container (-ngl 0). Weirdly the PP numbers are totally strange (needs investigation, better benchmarking).
I wrote a quick&dirty medium-com article about the details (https://medium.com/@andreask_75652/gpu-accelerated-containers-for-m1-m2-m3-macs-237556e5fe0b), but need to analyze it more, once I have more time. I also plan to do real benchmark-numbers comparable to #4167.
To me, the missing containerization with GPU-acceleration always was a strong drawback of Macs. With this there might be a way to solve easy/safe/fast installation also for Macs.
Ideas/Feedback very welcome.
Beta Was this translation helpful? Give feedback.
All reactions