-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmentation fault Alpaca #317
Comments
same problem |
probably it ran out of memory, I got that message when tried to run it with a low ram device. gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins
|
Thank you for your reply, |
I checked the RAM usage and it didn't exceed 5GB |
can you run this :
Then try to reproduce the seg. fault and provide the logs. |
Hi, not the original reporter but having an issue with SegFaults. Running 7B with command line specified above. Machine spec AMD 7 2700, 64Gb ram. 10Gb free disk space. Seg fault happens every time on all models. Branch a791a68. I removed the -O3 and re-ran to make sure nothing was optimized out Backtrace: Thread 1 "main" received signal SIGSEGV, Segmentation fault. #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:468 |
this looks like a memory corruption issue, I don't know if its related to your specific CPU or a bug in the current implementation ! Also can you do a break on line 4524 before running |
Hi, the same issue, although it may have produced more output than it usually does until it happened. I didn't run it with gdb, just from the command line, but I did get the output you asked for before running in non-debug. 4524 switch (src0->type) { I will run it with gdb if it helps. Just to confirm, compiler environment variables were CFLAGS = -I. -DDEBUG -std=c11 -fPIC and commented out stuff ... ifneq (,$(findstring AVX2,$(AVX2_M))) # CFLAGS += -mavx2 endif else ifeq ($(UNAME_S),Linux) AVX1_M := $(shell grep "avx " /proc/cpuinfo) ifneq (,$(findstring avx,$(AVX1_M))) # CFLAGS += -mavx endif |
Also getting segfaults and again just like antimatter15#7 it's after a longer interaction. So probably has something to do with context size as well. |
I also get it always with Alpaca and never with LLama models. Intel Mac, not running out of memory or swap. lldb backtrace for tag
|
Yeah, same here (running current master branch with all re-converted models). I added print debugging to Tensor type: -1086694353 |
Same here |
I can't GDB doesn't work for Apple Silicon. |
You can use lldb instead of gdb on Macs. Also, if core dumps are enabled, you can work with that as I did above. |
this is just out of bounds write to memory_k/memory_v when n_past goes past the end, ya? if you add this assert to ggml_view_1d |
This looks very reasonable. The question is why we don't see a problem with llama but do with alpaca... |
Hi, I have a core dump with both. Also, something is causing the output to stop, you can see where there is a blank line and I have to hit enter for it to continue. ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin.tmp -t 8 -n 256 --temp 0.8 --top_k 60 --repeat_penalty 1.0 --color --ignore-eos -i -r "Brim:" -f query5 Gilf is narrating Brims adventure, Brim is looking for a lost token in a mansion. Gilf: Hi Gilf: Uh oh, there's a trap here. ./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 256 --temp 0.8 --top_k 60 --repeat_penalty 1.0 --color --ignore-eos -i -r "Brim:" -f query5 Gilf is narrating Brims adventure, Brim is looking for a lost token in a mansion. Gilf: Hi Gilf: Yes, it's an O and a 3. Both produced core dumps after roughly the same amount of output. |
I got this: lldb ./main system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | main: prompt: ' Below is an instruction that describes a task. Write a response that appropriately completes the request.' main: interactive mode on. ' sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 == Running in interactive mode. ==
Below is an instruction that describes a task. Write a response that appropriately completes the request.
main`ggml_init: |
nah it's reproducible with any model. the key difference is interactive mode I think, which permits generating more tokens than the context size. need some way of purging old data from the k/v cache |
This looks like duplicate of #71 ? |
yes ! |
I have tried the alpaca.cpp project and it worked fine didn't close after a really really long conversation, don't know what they did different in alpaca.cpp as it seams to be pretty much the same as llama.cpp but it was running better for some reason. So I believe its not hardware related. |
I have got the segmentation fault with Llama too |
I've captured this gdb section. gdb: 0x000055555556950d in ggml_element_size (tensor=0x7fffe778ab30) at ggml.c:2443
2443 return GGML_TYPE_SIZE[tensor->type];
(gdb) list
2438 float ggml_type_sizef(enum ggml_type type) {
2439 return ((float)(GGML_TYPE_SIZE[type]))/GGML_BLCK_SIZE[type];
2440 }
2441
2442 size_t ggml_element_size(const struct ggml_tensor * tensor) {
2443 return GGML_TYPE_SIZE[tensor->type];
2444 }
2445
2446 static inline bool ggml_is_scalar(const struct ggml_tensor * tensor) {
2447 static_assert(GGML_MAX_DIMS == 4, "GGML_MAX_DIMS is not 4 - update this function");
(gdb) p tensor
$1 = (const struct ggml_tensor *) 0x7fffe778ab30
(gdb) p tensor->type
$2 = 3176610589
(gdb) p sizeof(GGML_TYPE_SIZE)
$3 = 56
(gdb) backtrace
#0 0x000055555556950d in ggml_element_size (tensor=0x7fffe778ab30) at ggml.c:2443
#1 0x000055555557b8a2 in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, n_past=518,
n_threads=<optimized out>) at llama.cpp:686
#2 0x000055555557bf2d in llama_eval (ctx=<optimized out>, tokens=<optimized out>, n_tokens=<optimized out>,
n_past=<optimized out>, n_threads=<optimized out>) at llama.cpp:1445
#3 0x000055555555c93d in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:323
(gdb) frame 1
#1 0x000055555557b8a2 in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, n_past=518,
n_threads=<optimized out>) at llama.cpp:686
686 struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));
(gdb) list
681 struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur);
682
683 // store key and value to memory
684 if (N >= 1) {
685 struct ggml_tensor * k = ggml_view_1d(ctx0, model.memory_k, N*n_embd, (ggml_element_size(model.memory_k)*n_embd)*(il*n_ctx + n_past));
686 struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));
687
688 ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Kcur, k));
689 ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Vcur, v));
690 }
(gdb) p il
$4 = 0
(gdb) p n_tokens
$5 = 1
(gdb) p n_past
$6 = 518
(gdb) f 2
#2 0x000055555557bf2d in llama_eval (ctx=<optimized out>, tokens=<optimized out>, n_tokens=<optimized out>,
n_past=<optimized out>, n_threads=<optimized out>) at llama.cpp:1445
1445 if (!llama_eval_internal(*ctx, tokens, n_tokens, n_past, n_threads)) {
(gdb) list
1440 struct llama_context * ctx,
1441 const llama_token * tokens,
1442 int n_tokens,
1443 int n_past,
1444 int n_threads) {
1445 if (!llama_eval_internal(*ctx, tokens, n_tokens, n_past, n_threads)) {
1446 fprintf(stderr, "%s: failed to eval\n", __func__);
1447 return 1;
1448 }
1449
(gdb) f 3
#3 0x000055555555c93d in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:323
323 if (llama_eval(ctx, embd.data(), embd.size(), n_past, params.n_threads)) {
(gdb) list
318 set_console_state(CONSOLE_STATE_PROMPT);
319
320 while (remaining_tokens > 0 || params.interactive) {
321 // predict
322 if (embd.size() > 0) {
323 if (llama_eval(ctx, embd.data(), embd.size(), n_past, params.n_threads)) {
324 fprintf(stderr, "%s : failed to eval\n", __func__);
325 return 1;
326 }
327 }
(gdb) |
Segmentation fault caused by unchecked NULL pointer when memory pool gets full? #373 (comment) |
Same as reported previously: something is corrupting (gdb) p tensor->type
$2 = 3176610589
(gdb) p sizeof(GGML_TYPE_SIZE)
$3 = 56 (which is 7 elements because: (56 / 8) == 7 elements)
2442 size_t ggml_element_size(const struct ggml_tensor * tensor) {
2443 return GGML_TYPE_SIZE[tensor->type];
2444 } |
looks like every time diff --git a/main.cpp b/main.cpp
index fbb43a8..866da4d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -327,6 +327,10 @@ int main(int argc, char ** argv) {
}
n_past += embd.size();
+ if (n_past > params.n_ctx) {
+ fprintf(stderr, "ERROR: segfault awaits.\nn_past should go past than n_ctx?\n");
+ exit(1);
+ }
embd.clear();
if ((int) embd_inp.size() <= input_consumed) {
|
Since the alpaca.cpp project currently does not exhibit this issue, and based on when these reports started appearing, the problem most likely is traced back to the tokenizer change and new model format #252 |
please try #438 and see if it fixes the problem. |
@Green-Sky, After some valid output, It prints a |
man, >.< i want the main.cpp cleaned up. you just can't reason about it's behavior anymore. way to cluttered. multiple state machines etc.... |
Also getting segmentation fault, while on the alpca.cpp not. My machine has about ~360GB of RAM memory, so it's almost impossible to get out of RAM. Checked on 13B and 30B models |
try using llama.cpp |
I believe this doesn't occur anymore. Closing |
I had a segmentation fault because the model was not completely downloaded, out of 46GB on the disk was only 20, which caused this error, I deleted and downloaded it again |
Hello,
I've tried out the Aplaca model but after a while there comes an error I believe stating: "zsh: segmentation fault ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins".
Thanks.
Code:
./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
main: seed = 1679305614
llama_model_load: loading model from './models/alpaca/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/alpaca/ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
main: prompt: ' Below is an instruction that describes a task. Write a response that appropriately completes the request.'
main: number of tokens in prompt = 21
1 -> ''
13866 -> ' Below'
338 -> ' is'
385 -> ' an'
15278 -> ' instruction'
393 -> ' that'
16612 -> ' describes'
263 -> ' a'
3414 -> ' task'
29889 -> '.'
14350 -> ' Write'
263 -> ' a'
2933 -> ' response'
393 -> ' that'
8210 -> ' appropriate'
368 -> 'ly'
4866 -> ' complete'
29879 -> 's'
278 -> ' the'
2009 -> ' request'
29889 -> '.'
main: interactive mode on.
main: reverse prompt: '### Instruction:
'
main: number of tokens in reverse prompt = 7
29937 -> '#'
2277 -> '##'
2799 -> ' Inst'
4080 -> 'ruction'
29901 -> ':'
13 -> '
'
13 -> '
'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
== Running in interactive mode. ==
Below is an instruction that describes a task. Write a response that appropriately completes the request.
The text was updated successfully, but these errors were encountered: