Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting a StableLM fine tuned model fails with Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. #4171

Closed
TheBloke opened this issue Nov 22, 2023 · 4 comments · Fixed by #4173

Comments

@TheBloke
Copy link
Contributor

Prerequisites

Tested on latest commit, 8e672ef , and also on commits from yesterday.

Current Behavior

Trying to convert model https://huggingface.co/pansophic/rocket-3B

Results in:

 [pytorch2] tomj@MC:/workspace/git/gguf-llama (master ✘)✭ ᐅ python3 ./convert-hf-to-gguf.py /workspace/process/pansophic_rocket-3b/source --outtype f16 --outfile /workspace/process/pansophic_rocket-3b/gguf/rocket-3b.fp16.gguf
Loading model: source
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
gguf: Adding 50009 merge(s).
gguf: Setting special token type bos to 0
gguf: Setting special token type eos to 0
gguf: Setting special token type unk to 0
Exporting model to '/workspace/process/pansophic_rocket-3b/gguf/rocket-3b.fp16.gguf'
gguf: loading model part 'pytorch_model.bin'
Traceback (most recent call last):
  File "/workspace/git/gguf-llama/./convert-hf-to-gguf.py", line 897, in <module>
    model_instance.write()
  File "/workspace/git/gguf-llama/./convert-hf-to-gguf.py", line 126, in write
    self.write_tensors()
  File "/workspace/git/gguf-llama/./convert-hf-to-gguf.py", line 98, in write_tensors
    data = data_torch.squeeze().numpy()
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

I noticed that the latest commits mentioned StaleLM so I tried rolling back to before them, but still got the same error.

I have confirmed that the model loads OK via Transformers, so it appears to be valid.

Any thoughts @Galunid ?

Thanks in advance

Environment and Context

Ubuntu 22.04, Python 3.10

@TheBloke
Copy link
Contributor Author

TheBloke commented Nov 22, 2023

Actually, maybe this is a trivial fix!

I followed the hint in the RuntimeError, and changed:

   data = data_torch.squeeze().numpy()

to:

   data = data_torch.squeeze().detach().numpy()

And it produced a valid FP16 which is producing output. Just validating the output is definitely OK..

Yes it is. OK, this is a one line fix I guess. Happy to PR it myself, if someone could confirm there's no potential risks attached to adding .detach() in all cases?

@KerfuffleV2
Copy link
Collaborator

https://pytorch.org/docs/stable/generated/torch.Tensor.detach.html

It doesn't sound like there would be a problem with always detaching.

@Galunid
Copy link
Collaborator

Galunid commented Nov 22, 2023

I don't think there are any risks in adding .detatch. We just read the tensors, so it should be fine to detatch them from gradient.
I'm not sure whether it should be

data = data_torch.squeeze().detach().numpy()

or

data = data_torch.detatch().squeeze().numpy()

This should work as temporary fix, although the proper solution would be to use torch.no_grad(), since it should reduce memory requirements (at least that's what the docs say).

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 1105670..20ad4ed 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -51,6 +51,7 @@ class Model:
     def set_vocab(self):
         self._set_vocab_gpt2()
 
+    @torch.no_grad()
     def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
         for part_name in self.part_names:
             print(f"gguf: loading model part '{part_name}'")
@@ -81,6 +82,7 @@ class Model:
             self.gguf_writer.add_head_count(n_head)
         self.gguf_writer.add_parallel_residual(self.hparams.get("use_parallel_residual", True))
 
+    @torch.no_grad()
     def write_tensors(self):
         block_count = self.hparams.get("n_layers", self.hparams.get("num_hidden_layers", self.hparams.get("n_layer")))
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)

@TheBloke
Copy link
Contributor Author

Thanks both

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants