Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample interface, new samplers, #1126

Merged
merged 8 commits into from
Apr 29, 2023
Merged

Conversation

ivanstepanovftw
Copy link
Collaborator

@ivanstepanovftw ivanstepanovftw commented Apr 22, 2023

ignore EOS should apply -inf to EOS logit, new line penalization option, logit bias support (#1024)

New samplers:

  • locally typical sampling
  • tail free sampling
  • frequency and presence penalty
  • Mirostat & Mirostat v2

🤖 Generated by Copilot at f571806

Summary

🦙📝?🦙🧠?🦙🔧?

This pull request enhances the llama text generation library with new sampling techniques and features, such as logit bias, typicality filtering, frequency and presence penalties, mirostat, and newline penalty. It also updates the examples and the API to use the new sampling functions and structs, and to handle arrays of llama_token_data. It modifies the command line options and the usage message in ./examples/common.cpp to reflect the new parameters and defaults.

We're coding with the llama, the llama of the sea
We're sampling with the logits, the logits are the key
We're adding new features, new features to the gpt_params
We're heaving on the yardarm, on the yardarm, one, two, three

Walkthrough

  • Implement new sampling techniques and features for llama, such as tail free sampling, frequency and presence penalties, Mirostat sampling, logit bias, and newline penalty (link, link, link, link, link, link, link)
  • Update the command line options and parameters in the common files to reflect the new sampling techniques and features, and add descriptions and references for them in the usage message (link, link, link, link)
  • Update the message printed to the standard error stream in the main example to include the values of the new parameters (link)
  • Modify the existing parameters and sampling logic in the main and save-load-state examples to use the new sampling techniques and features, and the new llama API functions (link, link, link)
  • Modify the llama_token_data struct to store the logit instead of the plog, and add a new struct and a new function for handling arrays of llama_token_data (link)

@Piezoid
Copy link
Contributor

Piezoid commented Apr 22, 2023

Nice work!

I'll link the literature here. Feel free to complete with more up do date sources.

I like the idea of a modular interface for sampling. It enables each code sample and application to combine these parts to do its own kitchen-sink sampling that fits their needs. Going further with this, the llama.h interface could be stripped to only provide access to logits and vocabulary, and the sampling code moved to a separate object file. This would emphasize and guarantee the extensibility of the samplers.

I am hesitant about the current implementation of repetition penalization.
As an illustration, I question whether the occurrence of past newlines and punctuation should guide the sampling of the following tokens. Attempting to fix this, the repetitions could be weighed against a simple frequency model. However, I wasn't able to recover such frequencies from the tokenizer weights.
It's possible to gather more information by measuring the length of the repetition that the next token would complete or interrupt. I have implemented this idea and an exponential decay.

Concerning the application of the penalization, I'm not sure whether it is better to offset the logits or to scale them. Subtracting to the logit, used by "frequency and presence penalty", amounts to scaling the probabilities. Scaling the logits, which is discussed in the CTRL paper, can be thought of as a way of raising probabilities to a power, but is dependent on the logit=0 point which is not particularly meaningful.
Your current implementation applies both methods successively, which seems redundant.

I haven't found the time to read in details about mirostat. My limited knowledge tells me that as the number of parameters goes up, the method becomes more challenging to apply in practice. Additionally, it seems difficult to control the changing target surprise mu using feedback, especially when working with an auto-regressive model. On the other hand, the promise of avoiding repetitions and boredom traps without looking at past tokens is very interesting.

I found that it is quite difficult to evaluate the sampling algorithms. We have good starting points with your analysis, the information-theoretic formalism of the locally typical sampling and mirostat papers, and their evaluation methods. Doing such experiments takes time end effort. Also, large scale human evaluations are next to impossible without a large community effort.

@ivanstepanovftw
Copy link
Collaborator Author

The CTRL paper does not mention, but in fact, the CTRL repository explicitly avoids penalizing newline tokens during sampling.

New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat

Ignore EOS fix: -inf should be used.
…s to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)
@ivanstepanovftw
Copy link
Collaborator Author

Rebased, added 2 commits since last review

@ggerganov
Copy link
Member

Mark "ready for review" when you think it is good for merge

@ivanstepanovftw

This comment was marked as resolved.

@ivanstepanovftw ivanstepanovftw force-pushed the sampling branch 2 times, most recently from a227f87 to 38e3148 Compare April 28, 2023 18:44
@ivanstepanovftw ivanstepanovftw marked this pull request as ready for review April 28, 2023 19:22
@ivanstepanovftw
Copy link
Collaborator Author

Ready for review

Copy link
Collaborator

@Green-Sky Green-Sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool. I always wanted a way to blacklist tokens, like backslash.

@ivanstepanovftw
Copy link
Collaborator Author

ivanstepanovftw commented Apr 29, 2023

very cool. I always wanted a way to blacklist tokens, like backslash.

Oh, I got it, for \code{begin}!

@ivanstepanovftw ivanstepanovftw deleted the sampling branch April 29, 2023 20:27
@Green-Sky
Copy link
Collaborator

Green-Sky commented Apr 29, 2023

Oh, I got it, for \code{begin}!

yea 😄 and \code{end} , the model often emits this before eos or tries do dodge/end the conversation.
Already tested it, works great.

edit: its -l 29905-100000 , if anyone is interested.

@ivanstepanovftw
Copy link
Collaborator Author

You could write -l 29905-inf 😊
I have used stof instead of stringstream just to make "inf" work

@byroneverson
Copy link

Any thoughts on the removal of parameter defaults of new sampling function to keep llama.h compatible with C/Obj-C?

@DenisSergeevitch
Copy link

DenisSergeevitch commented May 16, 2023

edit: its -l 29905-100000 , if anyone is interested.

Could anyone please share how to get the token id, and could I pass multiple tokens at once with the --logit-bias flag?

@Green-Sky
Copy link
Collaborator

Green-Sky commented May 16, 2023

@DenisSergeevitch you can supply --verbose-prompt

--verbose-prompt      print prompt before generation

eg:

$ bin/main --verbose-prompt -m ../models/open_llama_7b_preview_300bt/ggml-model-q4_0.bin -p "Test prompt"
 
...
 
main: prompt: ' Test prompt'
main: number of tokens in prompt = 3
     1 -> ''
  5073 -> ' Test'
  7593 -> ' prompt'
...

@ivanstepanovftw
Copy link
Collaborator Author

pass multiple tokens at once

Yes, by passing multiple arguments, like ./main ... -l 2-inf -l 13+2 -l 228+5.

@DenisSergeevitch
Copy link

pass multiple tokens at once

Yes, by passing multiple arguments, like ./main ... -l 2-inf -l 13+2 -l 228+5.

Thanks, I have done a small uncesoring method based on this flag, works like a charm!

@KerfuffleV2
Copy link
Collaborator

@ivanstepanovftw
I'm working on a Rust-based implementation of these samplers and using the code you wrote as a reference. I'm crediting the llama.cpp project but I can mention by name in the project README as well since you wrote it (and I don't think it's really been changed much since the initial commit). I didn't want to do something like that without asking first, though.

Also, if you're unhappy with the way I'm handling this (the credits or otherwise) please let me know and hopefully we can work something out!

Link: https://github.com/KerfuffleV2/llm-samplers/

@ivanstepanovftw
Copy link
Collaborator Author

@KerfuffleV2 Sure you can! Glad that you support RWKV, looks very promising.

@slaren slaren mentioned this pull request Sep 4, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants