-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute perplexity fails with too many tokens exception #385
Comments
The Will take a look later if it is still not fixed |
FYI: I just completed a 222 chunk run with the 30B q4 model by taking the first 1404 lines of the wikitext:
|
edit: proper fix here #390 for a quick fix you can do: diff --git a/utils.cpp b/utils.cpp
index 1679ae1..af822cc 100644
--- a/utils.cpp
+++ b/utils.cpp
@@ -148,6 +148,12 @@ std::string gpt_random_prompt(std::mt19937 & rng) {
std::vector<llama_token> llama_tokenize(struct llama_context * ctx, const std::string & text, bool add_bos) {
std::vector<llama_token> res(8096);
int n = llama_tokenize(ctx, text.c_str(), res.data(), res.size(), add_bos);
+ if (n < 0) {
+ res.resize(-n);
+ n = llama_tokenize(ctx, text.c_str(), res.data(), res.size(), add_bos);
+
+ assert(n >= 0);
+ }
res.resize(n);
return res; however, this is not a good solution, since it invokes the tokenizer 2 times for a large file. And produces a warning/error |
can you provide the output too? |
Is this what you need? 30B_int4.txt Memory usage for 30B / q4 as reported by
Currently running 65B / q4, and seeing 79.2GB constant memory usage. It is taking approx. twice as long as 30B, so full results tomorrow. 🐌 🐌 |
Poor man's Weights & Biases using |
Would be good to test the perplexity with the GPTQ quantization and compare with the usual RTN quantization. |
@BadisG see #129, which is becoming a catch-all issue for model quality. |
Update llama.py: Added how many input tokens in ValueError exception
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
It is supposed to compute perplexity like the original PR: #270
Current Behavior
However, it fails with the following exception:
Environment and Context
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Failure Logs
The text was updated successfully, but these errors were encountered: