🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ #42

trholding · 2024-03-18T02:28:43Z

~~Surprise! You can't run it on your average desktop or laptop!~~

Run grok-1 with less than 🔲 420 G VRAM

Run grok on an Mac Studio with an M2 Ultra and 192GB of unified ram: See: llama2.cpp / grok-1 support
@ibab_ml on X

You need a beefy machine to run grok-1

Grok-1 is a true mystical creature. Rumor has it that it lives in the cores of 8 GPU's and that the Model must fit in the VRAM.

This implies that you need a very beefy machine. Very very beefy machine. So beefy...

How do you know if your machine is beefy or not?

Your machine is not beefy if it is not big - the bigger the better, size matters! It has to make the sound of a jet engine when it thinks, also it has to be hot to the touch mostly.

It must also smell like burnt plastic at times. The more big iron, the more heavy the more beefy! If you didn't pay a heavy price for it, such as 100k$++, an arm and a leg, then it is not beefy.

What are some of the working setups?

llama2.cpp:

Mac

Mac Studio with an M2 Ultra
192GB of unified ram.

AMD

Threadripper 3955WX
256GB RAM
0.5 tokens per second.

This repo:

Intel + Nvidia

GPU: 8 x A100 80G
Total VRAM: 640G
CPU: 2 x Xeon 8480+
RAM: 1.5 TB

#168 (comment)

AMD

GPU: 8 x Instinct MI300X GPU 190G
Total VRAM: 1520G

#130 (comment)

Other / Container / Cloud

GPU: 8 x A100 80G
Total VRAM: 640G
K8 cluster

#6 (comment)

What can you do about it?

Try: See: llama2.cpp / grok-1 support

What are the other options?

Rent a GPU cloud instance with sufficient resources
Subscribe to grok at X (twitter.com)
Study the blade, save up money
Get someone to cosplay as grok

What is the Answer to the Ultimate Question of Life, the Universe, and Everything?

#42

Ref:
#168 (comment)
#130 (comment)
#130 (comment)
#125 (comment)
#6 (comment)
ggml-org/llama.cpp#6204 (comment)
ggml-org/llama.cpp#6204 (comment)
ggml-org/llama.cpp#6204 (comment)

See: Discussion
Note: This issue has been edited totally to elevate Issue 42 to serve a much better cause. @xSetech Would you not be tempted to pin this?
Edit: Corrected llama2.cpp inaccuracies

yarodevuci · 2024-03-18T04:26:13Z

@trholding and it will work with one GPU?

trholding · 2024-03-18T05:29:23Z

Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs.

gardner · 2024-03-18T07:24:17Z

A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards.

akumaburn · 2024-03-18T14:47:26Z

Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs.

They can technically overflow into system ram if running in OpenCL/CLBlast Mode (slower but working).

AdaptiveStep · 2024-03-19T20:14:09Z

I would rather have elon give us GPU's

surak · 2024-03-22T17:29:28Z

@trholding and it will work with one GPU?

No way. Not this model, even highly quantized. Unless it's a GH200 Data Center edition, which does have 96gb of VRAM integrated with 480GB of CPU ram. Then MAYBE.

davidearlyoung · 2024-03-23T23:21:39Z

A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards.

Someone may of figured it out:
https://huggingface.co/eastwind/grok-1-hf-4bit/tree/main

Looks to be about 90.2 GB on file when adding up the safetensor shards from the mentioned hugging face eastwind repo. Not sure what that would be needed for loading for inference. Likely will need more for overhead. I can't speak for mem usage or quality since this is still beyond my capacity.

davidearlyoung mentioned this issue Mar 18, 2024

Hardware requirements #62

Open

trholding changed the title ~~GGML and llama.cpp support~~ 🤖 Min HW Specs: 🔲 420 G+ VRAM • 🧠 8 GPUs • 💾 1337 G~ SSD ⚡ Mar 19, 2024

HaoLi111 mentioned this issue Mar 20, 2024

Quantization with less loss with Expert Offloading? Can we imitate Mixtral-offloading? #236

Open

superguo mentioned this issue Mar 22, 2024

The command line does not advance beyond that - Is everything okay? Is it supposed to be like this? #273

Open

trholding changed the title ~~🤖 Min HW Specs: 🔲 420 G+ VRAM • 🧠 8 GPUs • 💾 1337 G~ SSD ⚡~~ 🤖 Now run grok-1 with less than 🔲 420 G+ VRAM ⚡ Mar 23, 2024

trholding changed the title ~~🤖 Now run grok-1 with less than 🔲 420 G+ VRAM ⚡~~ 🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ Mar 23, 2024

trholding mentioned this issue Mar 23, 2024

⚠️ [Rule 69]: Make Love, not SPAM! ⚠️ #69

Closed

davidearlyoung mentioned this issue Mar 23, 2024

killed by os when running mac m3max and 128G Mem #277

Open

trholding mentioned this issue Mar 24, 2024

Allows CPU-based execution #235

Open

trholding closed this as completed Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ #42

🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ #42

trholding commented Mar 18, 2024 •

edited

Loading

yarodevuci commented Mar 18, 2024

trholding commented Mar 18, 2024

gardner commented Mar 18, 2024

akumaburn commented Mar 18, 2024 •

edited

Loading

AdaptiveStep commented Mar 19, 2024

surak commented Mar 22, 2024

davidearlyoung commented Mar 23, 2024

🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ #42

🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡ #42

Comments

trholding commented Mar 18, 2024 • edited Loading

Run grok-1 with less than 🔲 420 G VRAM

You need a beefy machine to run grok-1

How do you know if your machine is beefy or not?

What are some of the working setups?

llama2.cpp:

Mac

AMD

This repo:

Intel + Nvidia

AMD

Other / Container / Cloud

What can you do about it?

What are the other options?

What is the Answer to the Ultimate Question of Life, the Universe, and Everything?

yarodevuci commented Mar 18, 2024

trholding commented Mar 18, 2024

gardner commented Mar 18, 2024

akumaburn commented Mar 18, 2024 • edited Loading

AdaptiveStep commented Mar 19, 2024

surak commented Mar 22, 2024

davidearlyoung commented Mar 23, 2024

trholding commented Mar 18, 2024 •

edited

Loading

akumaburn commented Mar 18, 2024 •

edited

Loading