Q4_1 inference appears broken for 13B parameters #152

blackhole89 · 2023-03-15T03:22:13Z

I have been experimenting with q4_1 quantisation (since some preliminary results suggest it shold perform better), and noticed that something about the pipeline for the 13B parameter model is broken (whether it is the quantization itself, or the saving or loading). This results in all inferred tokens coming out as #. Meanwhile, 7B works well.

I know we had a patch a while ago that first made the 13B+ models work for q4_0 - did whatever fixes it made not cover q4_1?

The text was updated successfully, but these errors were encountered:

tisfeng · 2023-03-15T05:26:15Z

Yes, I use the 13B model and it doesn't work properly for either chatting or tasks. It simply ends without any response, which is very confusing to me.

ggerganov · 2023-03-15T06:19:28Z

First - great work!

Most likely the cause is that when I changed the Q4_0 scaling storage, I skipped the Q4_1 routines:

007a8f6

This change was necessary to make larger model works, because when we merge rows from different shards, the scaling factors have to be next to the integer quants. Originally, what happened, is the scaling factors of a row, ended up at the start of the memory buffer, before all the int quants, and this caused the merging in main.cpp to fail.

In short: you need to rearrange the scaling and offset factors for a chunk to be located right before the int quants in the memory buffer in order to have correct merging of the shards.

P.S. Btw it is strange that you don't see any asserts firing up. Are you building with -NDEBUG while developing? You should see infs or nans at the very start of the inference if you remove -NDEBUG

tisfeng · 2023-03-15T08:49:45Z

I didn't use -NDEBUG, I am a newcomer to this, just converted the 13B model using the latest code from the main branch，no errors were reported during the conversion.

make -j && ./main -m ./models/13B/ggml-model-q4_0.bin -p "What is the best programming language in the world? Why?" -t 8 -n 512

When using the 7B model, it works fine.

blackhole89 · 2023-03-15T23:25:39Z

In 13B Q4_1 mode:

I think it seems to work now, so I might make a PR against the main repository.

tisfeng · 2023-03-16T02:00:40Z

Great, thank you!

@Const-me

* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul

@Const-me

* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix antimatter15#152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul

@Const-me

* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix ggerganov#152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul

blackhole89 added the bug Something isn't working label Mar 15, 2023

gjmulder added the model Model specific label Mar 15, 2023

blackhole89 added a commit that referenced this issue Mar 15, 2023

Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

58175e3

blackhole89 closed this as completed Mar 15, 2023

blackhole89 added a commit that referenced this issue Mar 16, 2023

Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

a2e9d49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q4_1 inference appears broken for 13B parameters #152

Q4_1 inference appears broken for 13B parameters #152

blackhole89 commented Mar 15, 2023

tisfeng commented Mar 15, 2023

ggerganov commented Mar 15, 2023 •

edited

Loading

tisfeng commented Mar 15, 2023

blackhole89 commented Mar 15, 2023

tisfeng commented Mar 16, 2023

Q4_1 inference appears broken for 13B parameters #152

Q4_1 inference appears broken for 13B parameters #152

Comments

blackhole89 commented Mar 15, 2023

tisfeng commented Mar 15, 2023

ggerganov commented Mar 15, 2023 • edited Loading

tisfeng commented Mar 15, 2023

blackhole89 commented Mar 15, 2023

tisfeng commented Mar 16, 2023

ggerganov commented Mar 15, 2023 •

edited

Loading