Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

killed by os when running mac m3max and 128G Mem #277

Open
yangjiandan opened this issue Mar 22, 2024 · 5 comments
Open

killed by os when running mac m3max and 128G Mem #277

yangjiandan opened this issue Mar 22, 2024 · 5 comments

Comments

@yangjiandan
Copy link

yangjiandan commented Mar 22, 2024

dan@MacBook-Pro grok-1 % python3.11 run.py
INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda':
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/homebrew/lib/libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file), '/usr/local/lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache)
INFO:rank:Initializing mesh for self.local_mesh_config=(1, 1) self.between_hosts_config=(1, 1)...
INFO:rank:Detected 1 devices in mesh
INFO:rank:partition rules: <bound method LanguageModelConfig.partition_rules of LanguageModelConfig(model=TransformerConfig(emb_size=6144, key_size=128, num_q_heads=48, num_kv_heads=8, num_layers=64, vocab_size=131072, widening_factor=8, attn_output_multiplier=0.08838834764831845, name=None, num_experts=8, capacity_factor=1.0, num_selected_experts=2, init_scale=1.0, shard_activations=True, data_axis='data', model_axis='model'), vocab_size=131072, pad_token=0, eos_token=2, sequence_len=8192, model_size=6144, embedding_init_scale=1.0, embedding_multiplier_scale=78.38367176906169, output_multiplier_scale=0.5773502691896257, name=None, fprop_dtype=<class 'jax.numpy.bfloat16'>, model_type=None, init_scale_override=None, shard_embeddings=True)>
INFO:rank:(1, 256, 6144)
INFO:rank:(1, 256, 131072)
INFO:rank:State sharding type: <class 'model.TrainingState'>
INFO:rank:(1, 256, 6144)
INFO:rank:(1, 256, 131072)
INFO:rank:Loading checkpoint at ./checkpoints/ckpt-0
zsh: killed python3.11 run.py

@yangjiandan yangjiandan changed the title killed by os when running mac m3max and 128G killed by os when running mac m3max and 128G Mem Mar 22, 2024
@yangjiandan
Copy link
Author

image

@davidearlyoung
Copy link

Likely an OOM (out of memory) situation. Which is expected with a model that is larger then 128 GB with the raw weights. (300GB + for the full model) You likely will need to wait to run the whole model on a CPU with 128 GB of memory. 4-bit quantitation may be able to get the model down to the size of 96 GB with some quality loss in the output due to quantization effects. That is just speculation right now in regards of 4-bit quant weight size. #42

@rankaiyx
Copy link

https://huggingface.co/Arki05/Grok-1-GGUF
The actual measurement of the Q3_XS quantization model only needs 124G RAM.
The reasoning speed is similar to that of Q4 quantized miqu-70B.

@rankaiyx
Copy link

MiB Mem : 257617.7 total, 132751.1 free, 2548.8 used, 124396.7 buff/cache

llama_print_timings: load time = 87224.03 ms
llama_print_timings: sample time = 59.83 ms / 500 runs ( 0.12 ms per token, 8357.71 tokens per second)
llama_print_timings: prompt eval time = 7195.80 ms / 13 tokens ( 553.52 ms per token, 1.81 tokens per second)
llama_print_timings: eval time = 359604.96 ms / 499 runs ( 720.65 ms per token, 1.39 tokens per second)
llama_print_timings: total time = 367566.03 ms / 512 tokens

@yangjiandan
Copy link
Author

@rankaiyx Thanks very much for your answer, I will try it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants