Replies: 1 comment 2 replies
-
Exact same crash with
I read https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
My case - I use 100% working model ~8GB size but for test - I keep ~7Gb free in vram - so model wont load without limiting and it does load - vram goes full, nothing freeze and ~1GB ram being used - but crash in 1 sec not sure what going there |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wanted to test CUDA UVM support (#8035), mostly to see if it's viable on my 8GB VRAM setup (offloading only most layers on GPU + flash attention is already good enough in practice for me). However, llama.cpp scores an unexpected out-of-memory error and crashes when it starts to warm the model up:
Looking at code in question, this seems especially odd to me, as there's no obvious dynamic allocation taking place there. I also do have
nvidia-uvm
kernel module loaded.Beta Was this translation helpful? Give feedback.
All reactions