Sample interface, new samplers, #1126

ivanstepanovftw · 2023-04-22T12:36:01Z

ignore EOS should apply -inf to EOS logit, new line penalization option, logit bias support (#1024)

New samplers:

locally typical sampling
tail free sampling
frequency and presence penalty
Mirostat & Mirostat v2

`🤖 Generated by Copilot at f571806`

Summary

🦙📝?🦙🧠?🦙🔧?

This pull request enhances the llama text generation library with new sampling techniques and features, such as logit bias, typicality filtering, frequency and presence penalties, mirostat, and newline penalty. It also updates the examples and the API to use the new sampling functions and structs, and to handle arrays of llama_token_data. It modifies the command line options and the usage message in ./examples/common.cpp to reflect the new parameters and defaults.

We're coding with the llama, the llama of the sea
We're sampling with the logits, the logits are the key
We're adding new features, new features to the gpt_params
We're heaving on the yardarm, on the yardarm, one, two, three

Walkthrough

Implement new sampling techniques and features for llama, such as tail free sampling, frequency and presence penalties, Mirostat sampling, logit bias, and newline penalty (link, link, link, link, link, link, link)
Update the command line options and parameters in the common files to reflect the new sampling techniques and features, and add descriptions and references for them in the usage message (link, link, link, link)
Update the message printed to the standard error stream in the main example to include the values of the new parameters (link)
Modify the existing parameters and sampling logic in the main and save-load-state examples to use the new sampling techniques and features, and the new llama API functions (link, link, link)
Modify the llama_token_data struct to store the logit instead of the plog, and add a new struct and a new function for handling arrays of llama_token_data (link)

Makefile

Piezoid · 2023-04-22T15:40:33Z

Nice work!

I'll link the literature here. Feel free to complete with more up do date sources.

CTRL paper for the repetition penalty currently used in llama.cpp
Frequency and Presence penalties
locally typical sampling
tail free sampling
mirostat

I like the idea of a modular interface for sampling. It enables each code sample and application to combine these parts to do its own kitchen-sink sampling that fits their needs. Going further with this, the llama.h interface could be stripped to only provide access to logits and vocabulary, and the sampling code moved to a separate object file. This would emphasize and guarantee the extensibility of the samplers.

I am hesitant about the current implementation of repetition penalization.
As an illustration, I question whether the occurrence of past newlines and punctuation should guide the sampling of the following tokens. Attempting to fix this, the repetitions could be weighed against a simple frequency model. However, I wasn't able to recover such frequencies from the tokenizer weights.
It's possible to gather more information by measuring the length of the repetition that the next token would complete or interrupt. I have implemented this idea and an exponential decay.

Concerning the application of the penalization, I'm not sure whether it is better to offset the logits or to scale them. Subtracting to the logit, used by "frequency and presence penalty", amounts to scaling the probabilities. Scaling the logits, which is discussed in the CTRL paper, can be thought of as a way of raising probabilities to a power, but is dependent on the logit=0 point which is not particularly meaningful.
Your current implementation applies both methods successively, which seems redundant.

I haven't found the time to read in details about mirostat. My limited knowledge tells me that as the number of parameters goes up, the method becomes more challenging to apply in practice. Additionally, it seems difficult to control the changing target surprise mu using feedback, especially when working with an auto-regressive model. On the other hand, the promise of avoiding repetitions and boredom traps without looking at past tokens is very interesting.

I found that it is quite difficult to evaluate the sampling algorithms. We have good starting points with your analysis, the information-theoretic formalism of the locally typical sampling and mirostat papers, and their evaluation methods. Doing such experiments takes time end effort. Also, large scale human evaluations are next to impossible without a large community effort.

ivanstepanovftw · 2023-04-22T19:00:51Z

The CTRL paper does not mention, but in fact, the CTRL repository explicitly avoids penalizing newline tokens during sampling.

CMakeLists.txt

tests/test-sampling.cpp

New samplers: - locally typical sampling - tail free sampling - frequency and presence penalty - mirostat Ignore EOS fix: -inf should be used.

…s to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k) Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

ivanstepanovftw · 2023-04-28T17:21:59Z

Rebased, added 2 commits since last review

ggerganov · 2023-04-28T18:04:32Z

Mark "ready for review" when you think it is good for merge

ivanstepanovftw · 2023-04-28T19:43:52Z

Ready for review

Green-Sky

very cool. I always wanted a way to blacklist tokens, like backslash.

ivanstepanovftw · 2023-04-29T20:27:38Z

very cool. I always wanted a way to blacklist tokens, like backslash.

Oh, I got it, for \code{begin}!

Green-Sky · 2023-04-29T21:45:52Z

Oh, I got it, for \code{begin}!

yea 😄 and \code{end} , the model often emits this before eos or tries do dodge/end the conversation.
Already tested it, works great.

edit: its -l 29905-100000 , if anyone is interested.

ivanstepanovftw · 2023-04-30T00:10:43Z

You could write -l 29905-inf 😊
I have used stof instead of stringstream just to make "inf" work

byroneverson · 2023-04-30T07:12:41Z

Any thoughts on the removal of parameter defaults of new sampling function to keep llama.h compatible with C/Obj-C?

DenisSergeevitch · 2023-05-16T01:59:13Z

edit: its -l 29905-100000 , if anyone is interested.

Could anyone please share how to get the token id, and could I pass multiple tokens at once with the --logit-bias flag?

Green-Sky · 2023-05-16T12:00:51Z

@DenisSergeevitch you can supply --verbose-prompt

--verbose-prompt      print prompt before generation

eg:

$ bin/main --verbose-prompt -m ../models/open_llama_7b_preview_300bt/ggml-model-q4_0.bin -p "Test prompt"
 
...
 
main: prompt: ' Test prompt'
main: number of tokens in prompt = 3
     1 -> ''
  5073 -> ' Test'
  7593 -> ' prompt'
...

ivanstepanovftw · 2023-05-18T19:51:01Z

pass multiple tokens at once

Yes, by passing multiple arguments, like ./main ... -l 2-inf -l 13+2 -l 228+5.

DenisSergeevitch · 2023-05-19T01:01:54Z

pass multiple tokens at once

Yes, by passing multiple arguments, like ./main ... -l 2-inf -l 13+2 -l 228+5.

Thanks, I have done a small uncesoring method based on this flag, works like a charm!

KerfuffleV2 · 2023-06-20T12:57:10Z

@ivanstepanovftw
I'm working on a Rust-based implementation of these samplers and using the code you wrote as a reference. I'm crediting the llama.cpp project but I can mention by name in the project README as well since you wrote it (and I don't think it's really been changed much since the initial commit). I didn't want to do something like that without asking first, though.

Also, if you're unhappy with the way I'm handling this (the credits or otherwise) please let me know and hopefully we can work something out!

Link: https://github.com/KerfuffleV2/llm-samplers/

ivanstepanovftw · 2023-10-09T20:36:02Z

@KerfuffleV2 Sure you can! Glad that you support RWKV, looks very promising.

ivanstepanovftw force-pushed the sampling branch from 21e7ab4 to 9c78250 Compare April 22, 2023 12:41

Piezoid reviewed Apr 22, 2023

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Green-Sky requested changes Apr 28, 2023

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

ggerganov approved these changes Apr 28, 2023

View reviewed changes

tests/test-sampling.cpp Outdated Show resolved Hide resolved

ivanstepanovftw added 5 commits April 28, 2023 20:08

Sample interface, new samplers.

9b3b07c

New samplers: - locally typical sampling - tail free sampling - frequency and presence penalty - mirostat Ignore EOS fix: -inf should be used.

mirostat

f01c67f

Added --logit-bias and --no-penalize-nl, removed std::span

61f822f

Save and load example adjust

416f491

ivanstepanovftw force-pushed the sampling branch from 34459ca to 416f491 Compare April 28, 2023 17:20

ivanstepanovftw added 2 commits April 28, 2023 20:36

Tests

3bf3a96

Windows build fix

4ab7bb7

This comment was marked as resolved.

Sign in to view

ivanstepanovftw force-pushed the sampling branch 2 times, most recently from a227f87 to 38e3148 Compare April 28, 2023 18:44

Windows test fix

f571806

ivanstepanovftw force-pushed the sampling branch from ec822c5 to f571806 Compare April 28, 2023 19:12

ivanstepanovftw marked this pull request as ready for review April 28, 2023 19:22

Green-Sky approved these changes Apr 28, 2023

View reviewed changes

ggerganov merged commit dd7eff5 into ggml-org:master Apr 29, 2023

ggerganov added a commit that referenced this pull request Apr 29, 2023

common : change default parameters to pre-#1126

17d3938

ggerganov mentioned this pull request Apr 29, 2023

common : change default parameters to pre-#1126 #1223

Merged

ggerganov added a commit that referenced this pull request Apr 29, 2023

common : change default parameters to pre-#1126 (#1223)

334637e

ivanstepanovftw deleted the sampling branch April 29, 2023 20:27

grencez mentioned this pull request Apr 30, 2023

feat(option): to use different samplers rendezqueue/rendezllama#20

Closed

hlhr202 mentioned this pull request May 1, 2023

implement new sampling logic for llama.cpp Atome-FE/llama-node#36

Closed

DannyDaemonic mentioned this pull request May 3, 2023

Update main's README.md with new features #1296

Merged

nsvrana mentioned this pull request May 3, 2023

llama.cpp sampler selection oobabooga/text-generation-webui#1764

Closed

h3ndrik mentioned this pull request May 5, 2023

add new sampling algorithm "mirostat" LostRuins/koboldcpp#142

Merged

Piezoid mentioned this pull request Sep 14, 2023

Improving the repetition penalty #331

Closed

slaren mentioned this pull request Sep 4, 2024

llama : refactor sampling v2 #9294

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample interface, new samplers, #1126

Sample interface, new samplers, #1126

ivanstepanovftw commented Apr 22, 2023 •

edited by ghost

Loading

Piezoid commented Apr 22, 2023

ivanstepanovftw commented Apr 22, 2023

ivanstepanovftw commented Apr 28, 2023

ggerganov commented Apr 28, 2023

This comment was marked as resolved.

ivanstepanovftw commented Apr 28, 2023

Green-Sky left a comment

ivanstepanovftw commented Apr 29, 2023 •

edited

Loading

Green-Sky commented Apr 29, 2023 •

edited

Loading

ivanstepanovftw commented Apr 30, 2023

byroneverson commented Apr 30, 2023

DenisSergeevitch commented May 16, 2023 •

edited

Loading

Green-Sky commented May 16, 2023 •

edited

Loading

ivanstepanovftw commented May 18, 2023

DenisSergeevitch commented May 19, 2023

KerfuffleV2 commented Jun 20, 2023

ivanstepanovftw commented Oct 9, 2023

Sample interface, new samplers, #1126

Sample interface, new samplers, #1126

Conversation

ivanstepanovftw commented Apr 22, 2023 • edited by ghost Loading

🤖 Generated by Copilot at f571806

Summary

Walkthrough

Piezoid commented Apr 22, 2023

ivanstepanovftw commented Apr 22, 2023

ivanstepanovftw commented Apr 28, 2023

ggerganov commented Apr 28, 2023

This comment was marked as resolved.

ivanstepanovftw commented Apr 28, 2023

Green-Sky left a comment

Choose a reason for hiding this comment

ivanstepanovftw commented Apr 29, 2023 • edited Loading

Green-Sky commented Apr 29, 2023 • edited Loading

ivanstepanovftw commented Apr 30, 2023

byroneverson commented Apr 30, 2023

DenisSergeevitch commented May 16, 2023 • edited Loading

Green-Sky commented May 16, 2023 • edited Loading

ivanstepanovftw commented May 18, 2023

DenisSergeevitch commented May 19, 2023

KerfuffleV2 commented Jun 20, 2023

ivanstepanovftw commented Oct 9, 2023

ivanstepanovftw commented Apr 22, 2023 •

edited by ghost

Loading

`🤖 Generated by Copilot at f571806`

ivanstepanovftw commented Apr 29, 2023 •

edited

Loading

Green-Sky commented Apr 29, 2023 •

edited

Loading

DenisSergeevitch commented May 16, 2023 •

edited

Loading

Green-Sky commented May 16, 2023 •

edited

Loading