-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π€ Now run grok-1 with less than π² 420 G VRAM β‘ #42
Comments
@trholding and it will work with one GPU? |
Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs. |
A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards. |
They can technically overflow into system ram if running in OpenCL/CLBlast Mode (slower but working). |
I would rather have elon give us GPU's |
No way. Not this model, even highly quantized. Unless it's a GH200 Data Center edition, which does have 96gb of VRAM integrated with 480GB of CPU ram. Then MAYBE. |
Someone may of figured it out: Looks to be about 90.2 GB on file when adding up the safetensor shards from the mentioned hugging face eastwind repo. Not sure what that would be needed for loading for inference. Likely will need more for overhead. I can't speak for mem usage or quality since this is still beyond my capacity. |
Surprise! You can't run it on your average desktop or laptop!Run grok-1 with less than π² 420 G VRAM
Run grok on an Mac Studio with an M2 Ultra and 192GB of unified ram: See: llama2.cpp / grok-1 support
@ibab_ml on X
You need a beefy machine to run grok-1
Grok-1 is a true mystical creature. Rumor has it that it lives in the cores of 8 GPU's and that the Model must fit in the VRAM.
This implies that you need a very beefy machine. Very very beefy machine. So beefy...
How do you know if your machine is beefy or not?
Your machine is not beefy if it is not big - the bigger the better, size matters! It has to make the sound of a jet engine when it thinks, also it has to be hot to the touch mostly.
It must also smell like burnt plastic at times. The more big iron, the more heavy the more beefy! If you didn't pay a heavy price for it, such as 100k$++, an arm and a leg, then it is not beefy.
What are some of the working setups?
llama2.cpp:
Mac
AMD
This repo:
Intel + Nvidia
#168 (comment)
AMD
#130 (comment)
Other / Container / Cloud
#6 (comment)
What can you do about it?
Try: See: llama2.cpp / grok-1 support
What are the other options?
What is the Answer to the Ultimate Question of Life, the Universe, and Everything?
#42
Ref:
#168 (comment)
#130 (comment)
#130 (comment)
#125 (comment)
#6 (comment)
ggerganov/llama.cpp#6204 (comment)
ggerganov/llama.cpp#6204 (comment)
ggerganov/llama.cpp#6204 (comment)
See: Discussion
Note: This issue has been edited totally to elevate Issue 42 to serve a much better cause. @xSetech Would you not be tempted to pin this?
Edit: Corrected llama2.cpp inaccuracies
The text was updated successfully, but these errors were encountered: