Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault Alpaca #317

Closed
sussyboiiii opened this issue Mar 20, 2023 · 35 comments
Closed

segmentation fault Alpaca #317

sussyboiiii opened this issue Mar 20, 2023 · 35 comments
Labels
hardware Hardware related

Comments

@sussyboiiii
Copy link

Hello,
I've tried out the Aplaca model but after a while there comes an error I believe stating: "zsh: segmentation fault ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins".
Thanks.

Code:
./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
main: seed = 1679305614
llama_model_load: loading model from './models/alpaca/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/alpaca/ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

main: prompt: ' Below is an instruction that describes a task. Write a response that appropriately completes the request.'
main: number of tokens in prompt = 21
1 -> ''
13866 -> ' Below'
338 -> ' is'
385 -> ' an'
15278 -> ' instruction'
393 -> ' that'
16612 -> ' describes'
263 -> ' a'
3414 -> ' task'
29889 -> '.'
14350 -> ' Write'
263 -> ' a'
2933 -> ' response'
393 -> ' that'
8210 -> ' appropriate'
368 -> 'ly'
4866 -> ' complete'
29879 -> 's'
278 -> ' the'
2009 -> ' request'
29889 -> '.'

main: interactive mode on.
main: reverse prompt: '### Instruction:

'
main: number of tokens in reverse prompt = 7
29937 -> '#'
2277 -> '##'
2799 -> ' Inst'
4080 -> 'ruction'
29901 -> ':'
13 -> '
'
13 -> '
'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.
  • Press Return to return control to LLaMa.
  • If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

How are you?
I'm doing great! How about yourself?
Same, what is the capital of the USA?
The current and official Capital city for America is Washington D.C.. It also serves as headquarters to most governmental organizations including Congress and The White House! ???? #USACapital
Who is president?
The President of the United States of America, Donald Trump. He has held office since January 2017. His term ends in early 2025 and he will run for re-election next year as part of a Democratic sweep
How old is the earth
The Earth was formed about four billion years ago, but it wasn't inhabited by humans until around 2.5 million years ago with the appearance of Homo sapiens on the scene.[1]
How old is the universe?
The age and origin of our Universe are among some of its most fundamental mysteries, but astronomers have been able to calculate a minimum value for how long it has taken to form. Using measurements from supernovae explosions in other galaxies, which can be used as 'clocks', they estimate that the universe is between 13 and
What is the largest country?
The world's seven largest countries are (in descending order) China, India, United States of America, Brazil, Indonesia, Russia and Canada.
Which of them has the most people?
Indonesia is currently home to over 250 million inhabitants -- making it by far the largest country in population size!
"""
def get_country(world, year):
world = dict() # Dictionary for storing countries and their populations. We'll start outzsh: segmentation fault ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins

@KyL0N
Copy link
Contributor

KyL0N commented Mar 20, 2023

same problem

@msyyces8x95
Copy link

probably it ran out of memory, I got that message when tried to run it with a low ram device.
my stack trace to be confirmed in your case (change in Makefile -DNDEBUG -> -DDEBUG and add -g):
run :

gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins
Starting program: /home/pine/llama.cpp/main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
main: seed = 1679306208
llama_model_load: loading model from './models/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB

Program received signal SIGSEGV, Segmentation fault.
ggml_new_tensor_impl (ctx=0xaaaaaaaff888 <g_state+8>, type=type@entry=GGML_TYPE_Q4_0, n_dims=n_dims@entry=2,
    ne=ne@entry=0xffffffffd528, data=data@entry=0x0) at ggml.c:2658
2658        if (obj_cur != NULL) {

@sussyboiiii
Copy link
Author

Thank you for your reply,
could you explain what you want me to do in a bit more detail please? With the low RAM you mentioned, I've got 16GB but was able to run the 65B model (it took 45GB of RAM) it was really slow, but due to my Mac using its SSD as RAM it didn't run out of RAM so shouldn't it just get a bit slower when running out of proper RAM instead of just completely running out? (it always keeps 3GB of proper RAM unoccupied)

@gjmulder gjmulder added the hardware Hardware related label Mar 20, 2023
@sussyboiiii
Copy link
Author

I checked the RAM usage and it didn't exceed 5GB

@msyyces8x95
Copy link

msyyces8x95 commented Mar 20, 2023

I checked the RAM usage and it didn't exceed 5GB

can you run this :

gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins

Then try to reproduce the seg. fault and provide the logs.

@SavageShrimp
Copy link

SavageShrimp commented Mar 20, 2023

Hi, not the original reporter but having an issue with SegFaults. Running 7B with command line specified above.

Machine spec AMD 7 2700, 64Gb ram. 10Gb free disk space. Seg fault happens every time on all models.

Branch a791a68.

I removed the -O3 and re-ran to make sure nothing was optimized out

Backtrace:

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:468

#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:468
#1 0x000055555557f5fc in ggml_compute_forward_dup_f32 (params=0x7ffffffe4870, src0=0x7ffec2235a50, dst=0x7ffec224dd50)
at ggml.c:4439
#2 0x000055555557fcb4 in ggml_compute_forward_dup (params=0x7ffffffe4870, src0=0x7ffec2235a50, dst=0x7ffec224dd50)
at ggml.c:4531
#3 0x00005555555866d7 in ggml_compute_forward_cpy (params=0x7ffffffe4870, src0=0x7ffec2235a50, dst=0x7ffec224dd50)
at ggml.c:6916
#4 0x000055555558cd4f in ggml_compute_forward (params=0x7ffffffe4870, tensor=0x7ffec224dd50) at ggml.c:8742
#5 0x000055555558f0b0 in ggml_graph_compute (ctx=0x5555556957e8 <g_state+104>, cgraph=0x7ffffffe4a00) at ggml.c:9646
#6 0x0000555555560836 in llama_eval (model=..., n_threads=16, n_past=512,
embd_inp=std::vector of length 6, capacity 8 = {...}, embd_w=std::vector of length 32000, capacity 32000 = {...},
mem_per_token=@0x7fffffffcae0: 14565444) at main.cpp:743
#7 0x0000555555561f9b in main (argc=7, argv=0x7fffffffe3c8) at main.cpp:963

@msyyces8x95
Copy link

msyyces8x95 commented Mar 20, 2023

this looks like a memory corruption issue, I don't know if its related to your specific CPU or a bug in the current implementation !
Can you recompile with avx=false and test again? (to disable comment out line 78 and 82 in the Makefile)

Also can you do a break on line 4524 before running (gdb) b 4524 and after running do a printf src0->type

@SavageShrimp
Copy link

SavageShrimp commented Mar 21, 2023

Hi, the same issue, although it may have produced more output than it usually does until it happened. I didn't run it with gdb, just from the command line, but I did get the output you asked for before running in non-debug.

4524 switch (src0->type) {
(gdb) print src0->type
$1 = GGML_TYPE_F32
(gdb)

I will run it with gdb if it helps.

Just to confirm, compiler environment variables were

CFLAGS = -I. -DDEBUG -std=c11 -fPIC
CXXFLAGS = -I. -I./examples -DDEBUG -std=c++17 -fPIC

and commented out stuff ...

          ifneq (,$(findstring AVX2,$(AVX2_M)))
 #           CFLAGS += -mavx2
        endif
    else ifeq ($(UNAME_S),Linux)
        AVX1_M := $(shell grep "avx " /proc/cpuinfo)
        ifneq (,$(findstring avx,$(AVX1_M)))
 #           CFLAGS += -mavx
        endif 
 

@totoCZ
Copy link

totoCZ commented Mar 21, 2023

Also getting segfaults and again just like antimatter15#7 it's after a longer interaction. So probably has something to do with context size as well.

@aparashk
Copy link

I also get it always with Alpaca and never with LLama models. Intel Mac, not running out of memory or swap.

lldb backtrace for tag master-8cf9f34:

❯ lldb -c /cores/core.96845
(lldb) target create --core "/cores/core.96845"
Core file '/cores/core.96845' (x86_64) was loaded.
warning: main was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1
  * frame #0: 0x000000010ffce88d main`ggml_element_size(tensor=0x00007f93c2851b20) at ggml.c:2369:12 [opt]
    frame #1: 0x000000010ffc94c3 main`llama_eval(llama_model const&, int, int, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<float, std::__1::allocator<float> >&, unsigned long&) + 1555
    frame #2: 0x000000010ffcb53a main`main + 4362
    frame #3: 0x000000011bbea52e dyld`start + 462

@mattsta
Copy link

mattsta commented Mar 21, 2023

Yeah, same here (running current master branch with all re-converted models).

I added print debugging to ggml_element_size and sometimes it receives a corrupted tensor->type value so the array access segfaults.

Tensor type: -1086694353

@niltonvasques
Copy link

Same here

@sussyboiiii
Copy link
Author

I checked the RAM usage and it didn't exceed 5GB

can you run this :

gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins

Then try to reproduce the seg. fault and provide the logs.

I can't GDB doesn't work for Apple Silicon.

@aparashk
Copy link

You can use lldb instead of gdb on Macs. Also, if core dumps are enabled, you can work with that as I did above.

@eiz
Copy link
Contributor

eiz commented Mar 21, 2023

this is just out of bounds write to memory_k/memory_v when n_past goes past the end, ya?

if you add this assert to ggml_view_1d GGML_ASSERT((ne0 * GGML_TYPE_SIZE[a->type])/GGML_BLCK_SIZE[a->type] <= ggml_nbytes(a) - offset); it will trigger before the crash (at least for me it always hits prior to the ggml_element_size crash)

@aparashk
Copy link

This looks very reasonable. The question is why we don't see a problem with llama but do with alpaca...

@SavageShrimp
Copy link

SavageShrimp commented Mar 21, 2023

Hi, I have a core dump with both. Also, something is causing the output to stop, you can see where there is a blank line and I have to hit enter for it to continue.

./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin.tmp -t 8 -n 256 --temp 0.8 --top_k 60 --repeat_penalty 1.0 --color --ignore-eos -i -r "Brim:" -f query5

Gilf is narrating Brims adventure, Brim is looking for a lost token in a mansion.

Gilf: Hi
Brim: Hi Gilf, good to chat with you.
Gilf: It's a good night for a chat.
Brim: Hi Gilf, thanks for coming, where should we look first?
Gilf: Good question!
Brim: Ok, but where do you think we should look.
Gilf: I think I have a better plan, how do you feel about a little mansion?
Brim: You are not much help today, maybe I should look first.
Gilf: Maybe you should, but first we should get in.
Brim: Here's a door, opens door
Gilf: closes door on Brim
Brim: Hmm. opens door and enters
Gilf: opens door and enters
Brim: Oh, look, a small chest!
Gilf: opens chest
Brim: What do you see?
Gilf: looks in the chest and sees a token
Brim: I took the token!
Gilf: walks over to Brim, takes token away and gives Brim a token
Brim: Ok let's go upsairs
Gilf: Sure, let's go.
Brim: I am going to enter this bedroom.

Gilf: Uh oh, there's a trap here.
Brim: brim gets hit by rock
Gilf: Oh, no! I am so sorry!
Brim: It's ok. Quick, climb into the cupboard.
Gilf: climbs into the cupboard
Brim: brim opens a secret door
Gilf: opens door and gets hit by rock
Brim: Oh no!, stop it. Go through the secret door Gilf.
Gilf: goes through the secret door
Brim: Wow, look here, so much treasure, see if you can find any diamonds.
Gilf: opens chest and finds a token
Brim: Bah, too many tokens, we already have one.
Gilf:Segmentation fault (core dumped)

./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 256 --temp 0.8 --top_k 60 --repeat_penalty 1.0 --color --ignore-eos -i -r "Brim:" -f query5

Gilf is narrating Brims adventure, Brim is looking for a lost token in a mansion.

Gilf: Hi
Brim: Hi Gilf, good to chat with you.
Gilf: It's a good night for a chat.
Brim: How do you think we are going to get in?
Gilf: Hmmm. Well, we have a key, but I'm not sure that will work.
Brim: Try the key Gilf.
Gilf: Hmmm, it's stuck.
Brim: Oh well, we tried. I am going to climb through a window.
Gilf: Don't do that!
Brim: I am in, quick climb through the window Gilf.
Gilf: Alright, I'm coming in.
Brim: Wow, look here, it's a picture of a rabbit.
Gilf: It is. It's a rabbit with a cape.
Brim: I wonder what that means, a rabbit with a cape?
Gilf: Hmmm, maybe we will find out.
Brim: Hey Gilf, do you know ascii characters?
Gilf: Yes, I know them.
Brim: Ok, write out 0x27[3m please.
Gilf: Done.
Brim: Ok, look closely at the rabbit, do you notice anything?
?

Gilf: Yes, it's an O and a 3.
Brim: It looks like a golden dice?
Gilf: That's what it looks like.
Brim: That must be a clue.
Gilf: Well, I think it means something important is nearby.
Brim: Ok, that might be true, search behind the picture.
Gilf: Ok, and... nothing.
Brim: good start.
Gilf: Nothing huh.
Brim: We enter the kitchen
Gilf: We go in.
Brim: Look for something orange, I'm sure it's in there.
Gilf: Ok, let's look around.
Brim: Have you found anything orange yet?
Gilf: Nope, nothing.
Brim: Strange, oh look, here's an orange towel.
Segmentation fault (core dumped)

Both produced core dumps after roughly the same amount of output.

@sussyboiiii
Copy link
Author

sussyboiiii commented Mar 21, 2023

I checked the RAM usage and it didn't exceed 5GB

can you run this :

gdb ./main
(gdb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins

Then try to reproduce the seg. fault and provide the logs.

I got this:

lldb ./main
(lldb) target create "./main"
Current executable set to '/Users/dennisruff/llama.cpp/main' (arm64).
(lldb) r -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
Process 95762 launched: '/Users/dennisruff/llama.cpp/main' (arm64)
main: seed = 1679410592
llama_model_load: loading model from './models/alpaca/ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/alpaca/ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

main: prompt: ' Below is an instruction that describes a task. Write a response that appropriately completes the request.'
main: number of tokens in prompt = 21
1 -> ''
13866 -> ' Below'
338 -> ' is'
385 -> ' an'
15278 -> ' instruction'
393 -> ' that'
16612 -> ' describes'
263 -> ' a'
3414 -> ' task'
29889 -> '.'
14350 -> ' Write'
263 -> ' a'
2933 -> ' response'
393 -> ' that'
8210 -> ' appropriate'
368 -> 'ly'
4866 -> ' complete'
29879 -> 's'
278 -> ' the'
2009 -> ' request'
29889 -> '.'

main: interactive mode on.
main: reverse prompt: '### Instruction:

'
main: number of tokens in reverse prompt = 7
29937 -> '#'
2277 -> '##'
2799 -> ' Inst'
4080 -> 'ruction'
29901 -> ':'
13 -> '
'
13 -> '
'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.
  • Press Return to return control to LLaMa.
  • If you want to submit another line, end your input in ''.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

Hello.
Hi!
What is the largest building in the world?
The Pentagon, Washington DC (280 feet high), USA 714 ft x 536 ft. The Taipei 101 Building ,Taiwan 98 floors and observation deck at height of 303m
What is the highest building in the world?
The tallest manmade structure on earth, as well as its highest inhabitable floor currently exists within Dubai. This high rise towering over everything else was built by Emaar Properties and completed in 2
Who made the Pentagon?
The Pentagon is a five-sided structure located southwest of Washington, D.C., USA. The design for this building started under President Roosevelt's Administration in 1942 and was completed by Harry S Truman during World War II as part of the war effort.
How old is the Earth
The age of our planet earth can be calculated using many different methods; one involves measuring layers in sedimentary rocks and estimating how long it would take for those to form given current rates. Another method measures radioactive decay, which allows scientists
how big is our planet
The Earth's radius at the center of its core ranges from 150 miles (243 kilometers) to over 8976.6 mi (1,443 km). The equatorial diameter is roughly 21,000 sq miles (36,000 square kilometres), with polar diameters of about 3,745 and 3,746 miles respectively
The Earth's average density has been estimated to be between two to three times that of water. The mass can vary depending on the source; estimates have
whats the richest man?
Process 95762 stopped

main`ggml_init:
0x100008834 <+0>: sub sp, sp, #0xb0
0x100008838 <+4>: stp d13, d12, [sp, #0x20]
Target 0: (main) stopped.

A screenshot because github put something in there:
Bildschirm­foto 2023-03-21 um 16 02 46

@eiz
Copy link
Contributor

eiz commented Mar 21, 2023

This looks very reasonable. The question is why we don't see a problem with llama but do with alpaca...

nah it's reproducible with any model. the key difference is interactive mode I think, which permits generating more tokens than the context size. need some way of purging old data from the k/v cache

@ggerganov
Copy link
Owner

This looks like duplicate of #71 ?

@msyyces8x95
Copy link

msyyces8x95 commented Mar 21, 2023

This looks like duplicate of #71 ?

yes !

@sussyboiiii
Copy link
Author

sussyboiiii commented Mar 21, 2023

I have tried the alpaca.cpp project and it worked fine didn't close after a really really long conversation, don't know what they did different in alpaca.cpp as it seams to be pretty much the same as llama.cpp but it was running better for some reason. So I believe its not hardware related.

@sussyboiiii
Copy link
Author

I have got the segmentation fault with Llama too

@lzace817
Copy link

lzace817 commented Mar 22, 2023

I've captured this gdb section.
command: main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins --top_k 10000 --temp 0.96 --n_predict 512 --repeat_penalty 1 -t 3
total memory: 16Gb
commit: 56e659a

gdb:

0x000055555556950d in ggml_element_size (tensor=0x7fffe778ab30) at ggml.c:2443
2443	    return GGML_TYPE_SIZE[tensor->type];
(gdb) list
2438	float ggml_type_sizef(enum ggml_type type) {
2439	    return ((float)(GGML_TYPE_SIZE[type]))/GGML_BLCK_SIZE[type];
2440	}
2441	
2442	size_t ggml_element_size(const struct ggml_tensor * tensor) {
2443	    return GGML_TYPE_SIZE[tensor->type];
2444	}
2445	
2446	static inline bool ggml_is_scalar(const struct ggml_tensor * tensor) {
2447	    static_assert(GGML_MAX_DIMS == 4, "GGML_MAX_DIMS is not 4 - update this function");
(gdb) p tensor
$1 = (const struct ggml_tensor *) 0x7fffe778ab30
(gdb) p tensor->type
$2 = 3176610589
(gdb) p sizeof(GGML_TYPE_SIZE)
$3 = 56
(gdb) backtrace 
#0  0x000055555556950d in ggml_element_size (tensor=0x7fffe778ab30) at ggml.c:2443
#1  0x000055555557b8a2 in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, n_past=518, 
    n_threads=<optimized out>) at llama.cpp:686
#2  0x000055555557bf2d in llama_eval (ctx=<optimized out>, tokens=<optimized out>, n_tokens=<optimized out>, 
    n_past=<optimized out>, n_threads=<optimized out>) at llama.cpp:1445
#3  0x000055555555c93d in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:323
(gdb) frame 1
#1  0x000055555557b8a2 in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, n_past=518, 
    n_threads=<optimized out>) at llama.cpp:686
686	                struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));
(gdb) list
681	            struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur);
682	
683	            // store key and value to memory
684	            if (N >= 1) {
685	                struct ggml_tensor * k = ggml_view_1d(ctx0, model.memory_k, N*n_embd, (ggml_element_size(model.memory_k)*n_embd)*(il*n_ctx + n_past));
686	                struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));
687	
688	                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Kcur, k));
689	                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Vcur, v));
690	            }
(gdb) p il
$4 = 0
(gdb) p n_tokens
$5 = 1
(gdb) p n_past
$6 = 518
(gdb) f 2
#2  0x000055555557bf2d in llama_eval (ctx=<optimized out>, tokens=<optimized out>, n_tokens=<optimized out>, 
    n_past=<optimized out>, n_threads=<optimized out>) at llama.cpp:1445
1445	    if (!llama_eval_internal(*ctx, tokens, n_tokens, n_past, n_threads)) {
(gdb) list
1440	        struct llama_context * ctx,
1441	           const llama_token * tokens,
1442	                         int   n_tokens,
1443	                         int   n_past,
1444	                         int   n_threads) {
1445	    if (!llama_eval_internal(*ctx, tokens, n_tokens, n_past, n_threads)) {
1446	        fprintf(stderr, "%s: failed to eval\n", __func__);
1447	        return 1;
1448	    }
1449	
(gdb) f 3
#3  0x000055555555c93d in main (argc=<optimized out>, argv=<optimized out>) at main.cpp:323
323	            if (llama_eval(ctx, embd.data(), embd.size(), n_past, params.n_threads)) {
(gdb) list
318	    set_console_state(CONSOLE_STATE_PROMPT);
319	
320	    while (remaining_tokens > 0 || params.interactive) {
321	        // predict
322	        if (embd.size() > 0) {
323	            if (llama_eval(ctx, embd.data(), embd.size(), n_past, params.n_threads)) {
324	                fprintf(stderr, "%s : failed to eval\n", __func__);
325	                return 1;
326	            }
327	        }
(gdb) 

@mqy
Copy link
Contributor

mqy commented Mar 22, 2023

Segmentation fault caused by unchecked NULL pointer when memory pool gets full? #373 (comment)

@mattsta
Copy link

mattsta commented Mar 22, 2023

Same as reported previously: something is corrupting tensor->type to be larger than the 7 element array it's indexing into:

(gdb) p tensor->type
$2 = 3176610589
(gdb) p sizeof(GGML_TYPE_SIZE)
$3 = 56 (which is 7 elements because: (56 / 8) == 7 elements)

2442	size_t ggml_element_size(const struct ggml_tensor * tensor) {
2443	    return GGML_TYPE_SIZE[tensor->type];
2444	}

@lzace817
Copy link

looks like every time n_past goes over n_ctx. @mattsta could you check if you still segfault with this patch?

diff --git a/main.cpp b/main.cpp
index fbb43a8..866da4d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -327,6 +327,10 @@ int main(int argc, char ** argv) {
         }
 
         n_past += embd.size();
+        if (n_past > params.n_ctx) {
+            fprintf(stderr, "ERROR: segfault awaits.\nn_past should go past than n_ctx?\n");
+            exit(1);
+        }
         embd.clear();
 
         if ((int) embd_inp.size() <= input_consumed) {

@anzz1
Copy link
Contributor

anzz1 commented Mar 22, 2023

Since the alpaca.cpp project currently does not exhibit this issue, and based on when these reports started appearing, the problem most likely is traced back to the tokenizer change and new model format #252

@Green-Sky
Copy link
Collaborator

please try #438 and see if it fixes the problem.

@lzace817
Copy link

@Green-Sky, After some valid output, It prints a M that doesn't appear to be part of output, and then segfaults. I guess n_past should be <= n_ctx, but I don't know what those are. I suppose it keeps pushing more and more stuff in the memory storing the context until it explodes. I remember reading somewhere in the source n_past was the past context, therefore should fit inside the whole context.
In my experiments, looks like interactive mode mode is not context aware. shouldn't the context be restored to default state for each interaction?

@Green-Sky
Copy link
Collaborator

man, >.< i want the main.cpp cleaned up. you just can't reason about it's behavior anymore. way to cluttered. multiple state machines etc....

@DanielWicz
Copy link

DanielWicz commented Mar 25, 2023

Also getting segmentation fault, while on the alpca.cpp not. My machine has about ~360GB of RAM memory, so it's almost impossible to get out of RAM. Checked on 13B and 30B models

@Green-Sky
Copy link
Collaborator

alpca.cpp

try using llama.cpp

@sussyboiiii
Copy link
Author

I believe this doesn't occur anymore. Closing

@matveybuk
Copy link

Also getting segmentation fault, while on the alpca.cpp not. My machine has about ~360GB of RAM memory, so it's almost impossible to get out of RAM. Checked on 13B and 30B models

I had a segmentation fault because the model was not completely downloaded, out of 46GB on the disk was only 20, which caused this error, I deleted and downloaded it again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardware Hardware related
Projects
None yet
Development

No branches or pull requests