Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf dequantize failed #31725

Closed
1 of 4 tasks
PenutChen opened this issue Jul 1, 2024 · 11 comments
Closed
1 of 4 tasks

gguf dequantize failed #31725

PenutChen opened this issue Jul 1, 2024 · 11 comments
Labels

Comments

@PenutChen
Copy link
Contributor

System Info

transformers==4.42.3
torch==2.3.0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The example usage from doc:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

Expected behavior

Produce the following error:

Converting and de-quantizing GGUF tensors...:   0%|                         | 0/201 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data2/Penut/LLM-Backend/hello.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3583, in from_pretrained
    state_dict = load_gguf_checkpoint(gguf_path, return_tensors=True)["tensors"]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 146, in load_gguf_checkpoint
    weights = load_dequant_gguf_tensor(shape=shape, ggml_type=tensor.tensor_type, data=tensor.data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 499, in load_dequant_gguf_tensor
    values = dequantize_q6_k(data)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/data2/Penut/.miniconda/envs/Py311/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 284, in dequantize_q6_k
    data_f16 = np.frombuffer(data, dtype=np.float16).reshape(num_blocks, block_size // 2)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 26880000 into shape (152,105)
@amyeroberts
Copy link
Collaborator

cc @SunMarc

@PenutChen
Copy link
Contributor Author

PenutChen commented Jul 2, 2024

The correct workaround is to replace num_blocks in this code with -1, but I'm not sure if this is the correct behavior.

# transformers/integrations/ggml.py

def dequantize_q6_k(data):
    block_size = GGML_BLOCK_SIZES["Q6_K"]
    num_blocks = len(data) // block_size

    data_f16 = np.frombuffer(data, dtype=np.float16).reshape(-1, block_size // 2)
    data_u8 = np.frombuffer(data, dtype=np.uint8).reshape(-1, block_size)
    data_i8 = np.frombuffer(data, dtype=np.int8).reshape(-1, block_size)

    scales = data_f16[:, -1].reshape(-1, 1).astype(np.float32)

@SunMarc
Copy link
Member

SunMarc commented Jul 2, 2024

Hey @PenutChen thanks for opening the issue ! I tried your snippet on the main branch of transformers and on v4.42.3, and everything looks fine ! I suggest you to clear your cache and try it again. Also, which version of numpy are you using ? Maybe this is an issue with the 2.0 version was released recently.

@PenutChen
Copy link
Contributor Author

@SunMarc Thanks for the reply! I upgraded the numpy version to 1.26.4, but I still get the same error. After checking all my dependencies, I found that my gguf was installed from the source of the llama.cpp repo. I changed the version to the PyPI one, and it works!

@SunMarc
Copy link
Member

SunMarc commented Jul 2, 2024

Thanks for investigating ! Hopefully, for the next release of gguf, we won't have the issue you experienced.

@PenutChen
Copy link
Contributor Author

PenutChen commented Jul 3, 2024

The latest release of the gguf package is from Dec 13, 2023, but the gguf source still updates frequently. There are some incompatible settings between them. For anyone experiencing this issue, try the following commands:

pip install gguf==0.6.0 "numpy<2.0" --force-reinstall

@PenutChen
Copy link
Contributor Author

Hi @SunMarc, just a reminder that gguf-py has been updated to 0.9.1 recently. There might be some issues with this version. If I find anything new, I will reopen this issue.

@SunMarc
Copy link
Member

SunMarc commented Jul 12, 2024

Hi @PenutChen, thanks for the warning ! It looks like we indeed have failing tests on side. We get the same error you experienced. I will reopen the issue =)

@SunMarc SunMarc reopened this Jul 12, 2024
@gelbartm
Copy link

downgrading to gguf==0.6.0 solved it for me. Thanks for @PenutChen hint.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@PenutChen
Copy link
Contributor Author

solved by #32298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants