-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : fix kv cache management #3588
Conversation
I checked to see if the colon thing could possibly be related to #3550 but that doesn't appear to be the case. Still easy to reproduce with this pull. |
I don't see the issue anymore using this repro: #3575 (comment) Tried several hundred reqs, with zero appearances of the leading colons. |
…example * 'master' of github.com:ggerganov/llama.cpp: (34 commits) examples: support LLaVA v1.5 (multimodal model) (ggerganov#3436) docs : fix typo GOMP_CPU_AFFINITY (ggerganov#3597) cmake : fix add_compile_options on macOS typo : it is `--n-gpu-layers` not `--gpu-layers` (ggerganov#3592) ci : check if there is enough VRAM (ggerganov#3596) server : add completion mode (no chat) (ggerganov#3582) prompts : add mnemonics.txt server : fix kv cache management (ggerganov#3588) main : fix session loading bug (ggerganov#3400) server : add parameter -tb N, --threads-batch N (ggerganov#3584) common : fix mirostat state when using multiple sequences (ggerganov#3543) batched : add bench tool (ggerganov#3545) examples : add batched.swift + improve CI for swift (ggerganov#3562) Add MPT model to supported models in README.md (ggerganov#3574) Minor improvements in GPT2 tokenizer (ggerganov#3567) readme : add bloom (ggerganov#3570) llm : add bloom models (ggerganov#3553) swift : improvements and fixes (ggerganov#3564) llm : add MPT support (ggerganov#3417) infill. : fix tokenization (ggerganov#3508) ...
I see the issue as of commit 1e0e873. I've seen it more often when regenerating outputs. See if you can reproduce it more frequently by submitting the same input over and over again. My general observation is that once you've triggered an appearance of a colon, further submissions of the same input without advancing the conversation will cause an additional colon to prefix the output. (not always, but often enough) |
I'm also noticing that roleplaying models that are aware of emoji syntax like
It's like the model is internally perceiving a : character, which is in turn predisposing the output toward that emoji syntax. |
You should open a new issue so this gets more attention. |
I'll wait and see if d9cbf44 has improved the situation. If it remains unfixed, I'll open a new ticket. |
That's a cherry-pick of this PR - are you testing on 1e0e873 (which includes it) or not? |
Edit: Issue was upstream. I can no longer reproduce any of the bugs that I've mentioned. If you are still seeing the issues and using a third-party UI for llama.cpp, please bring up the issue up with the developer and make sure they are aware of the need tweak their kv management code. |
unfortunately I'm starting the see the bad outputs (colons) again on master |
ref #3575
Try to fix the reported issue. Not tested