-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample interface, new samplers, #1126
Conversation
21e7ab4
to
9c78250
Compare
Nice work! I'll link the literature here. Feel free to complete with more up do date sources.
I like the idea of a modular interface for sampling. It enables each code sample and application to combine these parts to do its own kitchen-sink sampling that fits their needs. Going further with this, the llama.h interface could be stripped to only provide access to logits and vocabulary, and the sampling code moved to a separate object file. This would emphasize and guarantee the extensibility of the samplers. I am hesitant about the current implementation of repetition penalization. Concerning the application of the penalization, I'm not sure whether it is better to offset the logits or to scale them. Subtracting to the logit, used by "frequency and presence penalty", amounts to scaling the probabilities. Scaling the logits, which is discussed in the CTRL paper, can be thought of as a way of raising probabilities to a power, but is dependent on the logit=0 point which is not particularly meaningful. I haven't found the time to read in details about mirostat. My limited knowledge tells me that as the number of parameters goes up, the method becomes more challenging to apply in practice. Additionally, it seems difficult to control the changing target surprise I found that it is quite difficult to evaluate the sampling algorithms. We have good starting points with your analysis, the information-theoretic formalism of the locally typical sampling and mirostat papers, and their evaluation methods. Doing such experiments takes time end effort. Also, large scale human evaluations are next to impossible without a large community effort. |
The CTRL paper does not mention, but in fact, the CTRL repository explicitly avoids penalizing newline tokens during sampling. |
New samplers: - locally typical sampling - tail free sampling - frequency and presence penalty - mirostat Ignore EOS fix: -inf should be used.
…s to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k) Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)
34459ca
to
416f491
Compare
Rebased, added 2 commits since last review |
Mark "ready for review" when you think it is good for merge |
This comment was marked as resolved.
This comment was marked as resolved.
a227f87
to
38e3148
Compare
ec822c5
to
f571806
Compare
Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool. I always wanted a way to blacklist tokens, like backslash.
Oh, I got it, for |
yea 😄 and edit: its |
You could write |
Any thoughts on the removal of parameter defaults of new sampling function to keep llama.h compatible with C/Obj-C? |
Could anyone please share how to get the token id, and could I pass multiple tokens at once with the --logit-bias flag? |
@DenisSergeevitch you can supply
eg:
|
Yes, by passing multiple arguments, like |
Thanks, I have done a small uncesoring method based on this flag, works like a charm! |
@ivanstepanovftw Also, if you're unhappy with the way I'm handling this (the credits or otherwise) please let me know and hopefully we can work something out! |
@KerfuffleV2 Sure you can! Glad that you support RWKV, looks very promising. |
ignore EOS should apply -inf to EOS logit, new line penalization option, logit bias support (#1024)
New samplers:
🤖 Generated by Copilot at f571806
Summary
🦙📝?🦙🧠?🦙🔧?
This pull request enhances the llama text generation library with new sampling techniques and features, such as logit bias, typicality filtering, frequency and presence penalties, mirostat, and newline penalty. It also updates the examples and the API to use the new sampling functions and structs, and to handle arrays of
llama_token_data
. It modifies the command line options and the usage message in./examples/common.cpp
to reflect the new parameters and defaults.Walkthrough
common
files to reflect the new sampling techniques and features, and add descriptions and references for them in the usage message (link, link, link, link)main
example to include the values of the new parameters (link)main
andsave-load-state
examples to use the new sampling techniques and features, and the new llama API functions (link, link, link)llama_token_data
struct to store the logit instead of the plog, and add a new struct and a new function for handling arrays ofllama_token_data
(link)