Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : fix kv cache management #3588

Merged
merged 1 commit into from
Oct 12, 2023
Merged

Conversation

ggerganov
Copy link
Owner

ref #3575

Try to fix the reported issue. Not tested

@KerfuffleV2
Copy link
Collaborator

I checked to see if the colon thing could possibly be related to #3550 but that doesn't appear to be the case. Still easy to reproduce with this pull.

@spencekim
Copy link

I don't see the issue anymore using this repro: #3575 (comment)

Tried several hundred reqs, with zero appearances of the leading colons.

@ggerganov ggerganov merged commit 57dd55e into master Oct 12, 2023
34 of 39 checks passed
@ggerganov ggerganov deleted the fix-server-kv-cache-manage branch October 12, 2023 06:29
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 12, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp: (34 commits)
  examples: support LLaVA v1.5 (multimodal model) (ggerganov#3436)
  docs : fix typo GOMP_CPU_AFFINITY (ggerganov#3597)
  cmake : fix add_compile_options on macOS
  typo : it is `--n-gpu-layers` not `--gpu-layers` (ggerganov#3592)
  ci : check if there is enough VRAM (ggerganov#3596)
  server : add completion mode (no chat) (ggerganov#3582)
  prompts : add mnemonics.txt
  server : fix kv cache management (ggerganov#3588)
  main : fix session loading bug (ggerganov#3400)
  server : add parameter -tb N, --threads-batch N (ggerganov#3584)
  common : fix mirostat state when using multiple sequences (ggerganov#3543)
  batched : add bench tool (ggerganov#3545)
  examples : add batched.swift + improve CI for swift (ggerganov#3562)
  Add MPT model to supported models in README.md (ggerganov#3574)
  Minor improvements in GPT2 tokenizer (ggerganov#3567)
  readme : add bloom (ggerganov#3570)
  llm : add bloom models (ggerganov#3553)
  swift : improvements and fixes (ggerganov#3564)
  llm : add MPT support (ggerganov#3417)
  infill. : fix tokenization (ggerganov#3508)
  ...
@blightbow
Copy link

I see the issue as of commit 1e0e873. I've seen it more often when regenerating outputs. See if you can reproduce it more frequently by submitting the same input over and over again. My general observation is that once you've triggered an appearance of a colon, further submissions of the same input without advancing the conversation will cause an additional colon to prefix the output. (not always, but often enough)

@blightbow
Copy link

blightbow commented Oct 16, 2023

I'm also noticing that roleplaying models that are aware of emoji syntax like :eyes: have an increased disposition toward beginning a response with a truncated one, like so:

eyes: This is an example.

It's like the model is internally perceiving a : character, which is in turn predisposing the output toward that emoji syntax.

@cebtenzzre
Copy link
Collaborator

I'm also noticing that roleplaying models that are aware of emoji syntax like :eyes: have an increased disposition toward beginning a response with a truncated one, like so:

You should open a new issue so this gets more attention.

@blightbow
Copy link

You should open a new issue so this gets more attention.

I'll wait and see if d9cbf44 has improved the situation. If it remains unfixed, I'll open a new ticket.

@cebtenzzre
Copy link
Collaborator

I'll wait and see if d9cbf44 has improved the situation.

That's a cherry-pick of this PR - are you testing on 1e0e873 (which includes it) or not?

@blightbow
Copy link

blightbow commented Oct 16, 2023

Sorry about that, I misread the activity. Yes, everything I have been discussing is current as of 1e0e873.

I will open a new ticket.

Edit: Issue was upstream. I can no longer reproduce any of the bugs that I've mentioned. If you are still seeing the issues and using a third-party UI for llama.cpp, please bring up the issue up with the developer and make sure they are aware of the need tweak their kv management code.

@spencekim
Copy link

I don't see the issue anymore using this repro: #3575 (comment)

Tried several hundred reqs, with zero appearances of the leading colons.

unfortunately I'm starting the see the bad outputs (colons) again on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants