-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q4_1 inference appears broken for 13B parameters #152
Comments
First - great work! Most likely the cause is that when I changed the This change was necessary to make larger model works, because when we merge rows from different shards, the scaling factors have to be next to the integer quants. Originally, what happened, is the scaling factors of a row, ended up at the start of the memory buffer, before all the int quants, and this caused the merging in In short: you need to rearrange the scaling and offset factors for a chunk to be located right before the int quants in the memory buffer in order to have correct merging of the shards. P.S. Btw it is strange that you don't see any asserts firing up. Are you building with |
I didn't use
When using the 7B model, it works fine. |
Great, thank you! |
* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix antimatter15#152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul
* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix ggerganov#152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul
I have been experimenting with q4_1 quantisation (since some preliminary results suggest it shold perform better), and noticed that something about the pipeline for the 13B parameter model is broken (whether it is the quantization itself, or the saving or loading). This results in all inferred tokens coming out as
#
. Meanwhile, 7B works well.I know we had a patch a while ago that first made the 13B+ models work for q4_0 - did whatever fixes it made not cover q4_1?
The text was updated successfully, but these errors were encountered: