-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sampling : add XTC sampler #9742
Changes from 45 commits
89640b0
9455194
db54ac5
41e1665
d9c9203
f2a2a61
4f8e55b
6d94ba2
49cd211
899e073
74f657c
59e8e63
63e60de
094caea
39940e5
4c44e3d
dbe9ef7
98b204c
8110f78
81a0c26
09bc6d5
c19fb26
6feb6b3
d0b1053
ed535bb
37e02e3
ba29d31
2107882
f7a383f
72db625
882a603
3968369
acada1a
dfe587a
9c43a01
68557eb
ea85a51
cca842f
ea62e65
44bbd63
a3e6522
dfef2c4
436a991
3613a6d
17ad143
2be814a
28d2cff
3496f58
050eb7a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1059,6 +1059,101 @@ struct llama_sampler * llama_sampler_init_temp_ext(float temp, float delta, floa | |
}; | ||
} | ||
|
||
// xtc | ||
|
||
struct llama_sampler_xtc { | ||
const float probability; | ||
const float threshold; | ||
const size_t min_keep; | ||
|
||
const uint32_t seed; | ||
uint32_t seed_cur; | ||
|
||
std::mt19937 rng; | ||
}; | ||
|
||
static const char * llama_sampler_xtc_name(const struct llama_sampler * /*smpl*/) { | ||
return "xtc"; | ||
} | ||
|
||
static void llama_sample_xtc_apply(struct llama_sampler * smpl, llama_token_data_array * cur_p) { | ||
auto * ctx = (llama_sampler_xtc *) smpl->ctx; | ||
|
||
if (ctx->probability <= 0.0f | ||
|| ctx->threshold > 0.5f | ||
|| cur_p->size < 2) { | ||
return; | ||
} | ||
|
||
std::uniform_real_distribution<float> distribution(0.0f, 1.0f); | ||
float chance = distribution(ctx->rng); | ||
if (chance > ctx->probability) return; | ||
slaren marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// in case it's not sorted/recalculated yet | ||
llama_sampler_softmax_impl(cur_p); | ||
|
||
int pos_last = 0; | ||
|
||
for (size_t i = 0; i < cur_p->size; ++i) { | ||
if (cur_p->data[i].p - ctx->threshold >= -1e-5) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this epsilon instead of a regular comparison? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added it after running tests - they were failing due to precision problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ugh, no. Never change correct code to satisfy tests. If the code is semantically correct, and the tests don't pass, the tests need to be adapted to account for things such as floating point shenanigans. But changing the code itself is always wrong, unless of course the code has a bug. The correct way to express this condition is if (cur_p->data[i].p >= ctx->threshold) { Nothing else will do. And the tests need to work with that, or the tests are wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure it should be considered a problem with tests (precision problem is a wider topic), but alright. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My basic point is that the tests serve the code, not the other way round. Code is only ever changed in response to failing tests if the code is found to have a bug. I had a similar problem with tests for an experimental sampler I wrote for llama.cpp a while ago, and I was able to work around it by using a different set of token probabilities in the tests that I constructed specifically so that after probability renormalization, the resulting values were exactly representable in floating point. |
||
pos_last = i; | ||
} else break; | ||
} | ||
|
||
if (cur_p->size - pos_last >= ctx->min_keep && pos_last > 0) { | ||
cur_p->data += pos_last; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may potentially break 3rd party code that expects this pointer to be unchanged (eg. to free it after sampling). I don't think this is necessarily a problem, but we should make it clear that this pointer may be changed by the samplers, and applications should not rely on it being unchanged. |
||
cur_p->size = cur_p->size - pos_last; | ||
MaggotHATE marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
|
||
static struct llama_sampler * llama_sampler_xtc_clone(const struct llama_sampler * smpl) { | ||
const auto * ctx = (const llama_sampler_xtc *) smpl->ctx; | ||
auto * result = llama_sampler_init_xtc(ctx->probability, ctx->threshold, ctx->min_keep, ctx->seed); | ||
|
||
// copy the state | ||
{ | ||
auto * result_ctx = (llama_sampler_xtc *) result->ctx; | ||
|
||
result_ctx->rng = ctx->rng; | ||
} | ||
|
||
return result; | ||
} | ||
|
||
static void llama_sampler_xtc_free(struct llama_sampler * smpl) { | ||
delete (llama_sampler_xtc *) smpl->ctx; | ||
} | ||
|
||
static void llama_sampler_xtc_reset(struct llama_sampler * smpl) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the purpose of this function? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK this is necessary to properly reset seed and maintain repeatability, as recommended by @slaren earlier. |
||
auto * ctx = (llama_sampler_xtc *) smpl->ctx; | ||
ctx->seed_cur = get_rng_seed(ctx->seed); | ||
ctx->rng.seed(ctx->seed_cur); | ||
} | ||
|
||
static struct llama_sampler_i llama_sampler_xtc_i = { | ||
/* .name = */ llama_sampler_xtc_name, | ||
/* .accept = */ nullptr, | ||
/* .apply = */ llama_sample_xtc_apply, | ||
/* .reset = */ llama_sampler_xtc_reset, | ||
/* .clone = */ llama_sampler_xtc_clone, | ||
/* .free = */ llama_sampler_xtc_free, | ||
}; | ||
|
||
struct llama_sampler * llama_sampler_init_xtc(float p, float t, size_t min_keep, uint32_t seed) { | ||
auto seed_cur = get_rng_seed(seed); | ||
return new llama_sampler { | ||
/* .iface = */ &llama_sampler_xtc_i, | ||
/* .ctx = */ new llama_sampler_xtc { | ||
/* .probability = */ p, | ||
/* .threshold = */ t, | ||
/* .min_keep = */ min_keep, | ||
/* .seed = */ seed, | ||
/* .seed_cur = */ seed_cur, | ||
/* .rng = */ std::mt19937(seed_cur), | ||
}, | ||
}; | ||
} | ||
|
||
// mirostat | ||
|
||
struct llama_sampler_mirostat { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still necessary to pass an explicit sampler chain via
--sampling-seq
in order to activate XTC? I thought that it is now in the sampler chain by default, and disabled by havingxtc_probability
set to0
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p-e-w While XTC is included into the sampler queue by default, it is put after all other truncating samplers. As such, the recommended combinations of samplers, as per your words in oobabooga/text-generation-webui#6335 , requires passing samplers chain explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Seems like the order is correct by default (XTC after truncation, which Min-P is). And all samplers are set to neutral (off) by default, right? So what does
--sampling-seq mx
do that wouldn't happen otherwise?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p-e-w Not all samplers - Top K is 40 by default and Top P is 0.95. So, either they need to be set to 0 and 1.0 respectively, or (which is easier and more logical) sampling queue should be limited to only the samplers we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮 I had no idea! So llama.cpp users are getting crap samplers from the Stone Age without even realizing it. That's terrible. I would have expected a clean slate that samples from the raw model distribution unless parameters are set explicitly.
Anyway, this means your example command is correct, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, this topic has already been brought up some time ago in other PRs. However, since in most cases llama.cpp is used as a library through another app or as a server, this issue is mostly related to llama-cli users. You can look at
index-new.html
(new server UI): it has different "default" values withtop_k
andtop_p
turned off, and I assume any other frontend will have a payload with all parameters set as needed.But yes, this is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llama-cli user here via scripts... thanks for bringing this up at any rate, Improved my (custom) benchmark scores just by adjusting settings to disable Top K and Top P.
I've seen the default params adjusted a few times for llama-cli; feels like this would be a good change.