You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Hi, thanks for the continued effort with llama.cpp. I cloned the repo, then built with make as usual.
Expected Behavior
Run ./server without error messages. This issue was not present in #2009. Unfortunately, I'm receiving errors during inference with ./server with commit #2116. I'll test other builds..
Current Behavior
Errors during ./server inference:
llama_eval_internal: first token must be BOS
llama_eval: failed to eval
It's abrupt, and cuts off response in the middle of a sentence. Here's an example:
~/ollama (master)> ./server -m ~/wizardlm-7b-v1.0-uncensored.ggmlv3.q4_0.bin -t 4 -b 10
{"timestamp":1688607679,"level":"INFO","function":"main","line":1085,"message":"build info","build":796,"commit":"31cfbb1"}
{"timestamp":1688607679,"level":"INFO","function":"main","line":1090,"message":"
system info","n_threads":4,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | "}
llama.cpp: loading model from /data/data/com.termux/files/home/wizardlm-7b-v1.0-uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 5407.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size = 256.00 MB
llama server listening at http://127.0.0.1:8080)
{"timestamp":1688607679,"level":"INFO","function":"main","line":1305,"message":"HTTP server listening","hostname":"127.0.0.1","port":8080}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37210,"status":200,"method":"GET","path":"/","params":{}}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37210,"status":200,"method":"GET","path":"/completion.js","params":{}}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37212,"status":200,"method":"GET","path":"/index.js","params":{}}
{"timestamp":1688607685,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37210,"status":404,"method":"GET","path":"/favicon.ico","params":{}}
llama_print_timings: load time = 2102.17 ms
llama_print_timings: sample time = 3291.18 ms / 355 runs ( 9.27 ms per token, 107.86 tokens per second)
llama_print_timings: prompt eval time = 10480.78 ms / 49 tokens ( 213.89 ms per token, 4.68 tokens per second)
llama_print_timings: eval time = 124335.87 ms / 354 runs ( 351.23 ms per token, 2.85 tokens per second)
llama_print_timings: total time = 138282.27 ms
{"timestamp":1688607964,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37214,"status":200,"method":"POST","path":"/completion","params":{}}
llama_eval_internal: first token must be BOS
llama_eval: failed to eval
{"timestamp":1688608023,"level":"ERROR","function":"nextToken","line":360,"message":"failed to eval","n_eval":10,"n_past":0,"n_threads":4,"embd":"
rare ingredients for potions, and even delved into dangerous dungeons filled with
treacherous monsters. Along the way, she made friends with other creatures who shared her passion for knowledge and
adventure, including dragons, unicorns, and even mermaids.\nAs time passed, Luna grew stronger both physically and mentally,
becoming an extraordinary creature capable of performing incredible feats. And yet,
despite all her newfound powers, she never forgot where she came from or the humble roots that first led her down this path.
For Luna always remained true to her llama nature, using her abilities only for good and spreading joy wherever she went.\n
User: Thanks. Describe Lunas appearance please.\n
llama: As a young llama, Luna was adorable with soft brown fur, long eyelashes, and a friendly smile. But as she embarked on her
journey towards greatness, her physical features began to change in mysterious ways. Her eyes
became more intense, glowing like crystals themselves, while her body developed powerful
muscles and a shimmering golden coat. She now stood taller than any ordinary ll"}
llama_print_timings: load time = 2102.17 ms
llama_print_timings: sample time = 936.31 ms / 93 runs ( 10.07 ms per token, 99.33 tokens per second)
llama_print_timings: prompt eval time = 4246.50 ms / 16 tokens ( 265.41 ms per token, 3.77 tokens per second)
llama_print_timings: eval time = 29930.84 ms / 92 runs ( 325.34 ms per token, 3.07 tokens per second)
llama_print_timings: total time = 35164.16 ms
{"timestamp":1688608023,"level":"INFO","function":"log_server_request","line":1058,"message":"request","remote_addr":"127.0.0.1","remote_port":37216,"status":200,"method":"POST","path":"/completion","params":{}}
^C
Prerequisites
Hi, thanks for the continued effort with llama.cpp. I cloned the repo, then built with make as usual.
Expected Behavior
Run ./server without error messages. This issue was not present in #2009. Unfortunately, I'm receiving errors during inference with ./server with commit #2116. I'll test other builds..
Current Behavior
Errors during ./server inference:
It's abrupt, and cuts off response in the middle of a sentence. Here's an example:
Environment and Context
uname -a
Failure Information (for bugs)
llama_eval_internal: first token must be BOS
llama_eval: failed to eval
Steps to Reproduce
Thank you!
The text was updated successfully, but these errors were encountered: