-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gguf : add 64-bit support (GGUF v2) #2821
Conversation
We should add types |
Need some help with the Python code In the meantime, I will now add V1 backward comp in |
We should change to uint64_t on all lengths / sizes / counts just to be safe and future-proof, not only change tensor dimensions. |
I tested loading a couple GGUF v1 models, the backward compatibility seems to work fine. |
Similarly, no issues loading various v1 models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both versions work good.
We can actually use |
Looks good, is the plan to update the metadata values for the lengths/etc before merge? |
@klosax Ah, that's useful. For a 7b q4_0 model, I use I don't need |
I dont think those parameters are needed. Maybe we should have a new parameter |
Lines 4743 to 4746 in 730d9c6
That logic is actually kind of wrong because the k-quants stuff can choose a different type than It probably will work for the non-k-quants types but pretty sure k-quants won't work. (There were also some changes to the decisions k-quants makes for LLaMA2 70B models so in that particular case it wouldn't pass through all the tensors even if the other issues were dealt with.) |
Thanks. I used |
Thanks everyone for testing. We should merge this - anything else we won't to try before this? |
* gguf : bump version to 2 * gguf : add support for 64-bit (no backwards comp yet) * gguf : v1 backwards comp * gguf.py : bump GGUF version * gguf.py : uint64_t on all lengths, sizes and counts, enums still uint32_t * gguf.py : string lengths uint32_t * gguf : update all counts to 64-bit * gguf.py : string len uint64_t and n_dims uint32_t * gguf : fix typo * llama.cpp : print gguf version --------- Co-authored-by: klosax <[email protected]>
I am a long-term enthusiast for whisper.cpp which I use by default nowadays to transcribe my podcast Unmaking Sense.
|
did you press it more than once? It queues a stop and gives you the control, and then if pressed again, exits the program. try to play with it a bit more :)
did you use the prompt template? |
It seems that if you use Ctrl-C while the assistant is printing a reply, it behaves as expected and described, but if you press it afterwards, it aborts. Thanks for the hint.
I hadn't, but now I have. Thank you, again. Unfortunately it seems to lead to a collapse of the quality of the response to a point where it is worthless, but I therefore obviously need to investigate the process more. |
If you'd need to follow up, I'd suggest making an issue specifically to discuss your problem. This is a pull request that doesn't seem directly related. |
Adding 64-bit support as discussed: ggerganov/ggml#302 (comment)
Help with testing is appreciated. Should be backward compatible with v1