-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BPE pre-tokenization for Command-R. #7033
Conversation
I tried to run the diff --git a/convert-hf-to-gguf-update.py b/convert-hf-to-gguf-update.py
index f4774003..4ad4d867 100644
--- a/convert-hf-to-gguf-update.py
+++ b/convert-hf-to-gguf-update.py
@@ -95,6 +95,14 @@ for model in models:
save_path = f"models/tokenizers/{name}/tokenizer.json"
download_file_with_auth(url, token, save_path)
+ # if downloaded file is less than 1KB, we likely need to download an LFS instead
+ if os.path.getsize(save_path) < 1024:
+ # remove the file
+ os.remove(save_path)
+ url = f"{repo}/resolve/main/tokenizer.json"
+ save_path = f"models/tokenizers/{name}/tokenizer.json"
+ download_file_with_auth(url, token, save_path)
+
if tokt == TOKENIZER_TYPE.SPM:
url = f"{repo}/resolve/main/tokenizer.model"
save_path = f"models/tokenizers/{name}/tokenizer.model" |
Lets goooooO! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had an a-ha moment, and these look good. I've run it and got the same.
Optional: add in command-r-plus with nearly identical footprint, since the pre-tokenizers are the same (couple ways to make that happen).
One note: tokenizer.config
specifies the Digits pre-tokenizer is used and individual_digits=True
. There's not a regex pattern in llama.cpp
explicitly matching individual digits. EVen though the tests pass, would that be a possibilty to add in for completeness sake?
I get why it's changed, but does it mean the old quants are broken even when using the older versions of llamacpp? I'm not sure I can face downloading and quantizing it myself. |
My understanding of the overall issue (which admittedly is not as nuanced as others') would suggest that, for the most part, yes you should re-download and re-convert and re-quantize your models. The longer version is that newer models are more likely to use newer pre-tokenizer splits. Luckily, command-r and command-r-plus appear to be nearly identical to the default pre-tokenizer's splits, so for these models it's probably okay? |
I just remembered that the imat quants will definitely be incorrect for the new version of llamacpp. I have a non-imat q6k that I was hoping to use until some kind person makes a new q8 for huggingface ;) |
Yeah exactly...lots of componding issues. I've ended up moving some of these over to Ollama as well, so now I have to untangle what came from me and what I pulled from them. But I wouldn't be surprised if the majority of Ollama models have this same issue and will need to be re-built. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yeah. Let's just approve this to get it in there, and I'll add in command-r-plus separately.
This PR works for both Command-R version, since they have the same hash. |
Thanks, I added this update to the PR. |
had to run this several times (it'll create the folder and then error out it seems)
i think cmd-r was already done at this point but i just wanted to point this out.
blocked, but cmd-r is done anyway Got to convert Command R 35B into a f16 GGUF. I'm currently quantizing them and I'll test out Q5KM soon. |
@drummerv I am not getting any errors. convert-hf-to-gguf-update.py
|
Superseded by PR #7063. |
I read #6920 and 120cf37 and attempting to add support Command-R support.
Closes #7030 and #7040.
EDIT: I also tested Command-R+ successfully using this PR.