Converting a StableLM fine tuned model fails with `Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.` #4171

TheBloke · 2023-11-22T18:06:09Z

Prerequisites

Tested on latest commit, 8e672ef , and also on commits from yesterday.

Current Behavior

Trying to convert model https://huggingface.co/pansophic/rocket-3B

Results in:

 [pytorch2] tomj@MC:/workspace/git/gguf-llama (master ✘)✭ ᐅ python3 ./convert-hf-to-gguf.py /workspace/process/pansophic_rocket-3b/source --outtype f16 --outfile /workspace/process/pansophic_rocket-3b/gguf/rocket-3b.fp16.gguf
Loading model: source
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
gguf: Adding 50009 merge(s).
gguf: Setting special token type bos to 0
gguf: Setting special token type eos to 0
gguf: Setting special token type unk to 0
Exporting model to '/workspace/process/pansophic_rocket-3b/gguf/rocket-3b.fp16.gguf'
gguf: loading model part 'pytorch_model.bin'
Traceback (most recent call last):
  File "/workspace/git/gguf-llama/./convert-hf-to-gguf.py", line 897, in <module>
    model_instance.write()
  File "/workspace/git/gguf-llama/./convert-hf-to-gguf.py", line 126, in write
    self.write_tensors()
  File "/workspace/git/gguf-llama/./convert-hf-to-gguf.py", line 98, in write_tensors
    data = data_torch.squeeze().numpy()
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

I noticed that the latest commits mentioned StaleLM so I tried rolling back to before them, but still got the same error.

I have confirmed that the model loads OK via Transformers, so it appears to be valid.

Any thoughts @Galunid ?

Thanks in advance

Environment and Context

Ubuntu 22.04, Python 3.10

The text was updated successfully, but these errors were encountered:

TheBloke · 2023-11-22T18:09:12Z

Actually, maybe this is a trivial fix!

I followed the hint in the RuntimeError, and changed:

   data = data_torch.squeeze().numpy()

to:

   data = data_torch.squeeze().detach().numpy()

And it produced a valid FP16 which is producing output. Just validating the output is definitely OK..

Yes it is. OK, this is a one line fix I guess. Happy to PR it myself, if someone could confirm there's no potential risks attached to adding .detach() in all cases?

KerfuffleV2 · 2023-11-22T18:43:53Z

https://pytorch.org/docs/stable/generated/torch.Tensor.detach.html

It doesn't sound like there would be a problem with always detaching.

Galunid · 2023-11-22T19:23:29Z

I don't think there are any risks in adding .detatch. We just read the tensors, so it should be fine to detatch them from gradient.
I'm not sure whether it should be

data = data_torch.squeeze().detach().numpy()

or

data = data_torch.detatch().squeeze().numpy()

This should work as temporary fix, although the proper solution would be to use torch.no_grad(), since it should reduce memory requirements (at least that's what the docs say).

diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 1105670..20ad4ed 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -51,6 +51,7 @@ class Model:
     def set_vocab(self):
         self._set_vocab_gpt2()
 
+    @torch.no_grad()
     def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
         for part_name in self.part_names:
             print(f"gguf: loading model part '{part_name}'")
@@ -81,6 +82,7 @@ class Model:
             self.gguf_writer.add_head_count(n_head)
         self.gguf_writer.add_parallel_residual(self.hparams.get("use_parallel_residual", True))
 
+    @torch.no_grad()
     def write_tensors(self):
         block_count = self.hparams.get("n_layers", self.hparams.get("num_hidden_layers", self.hparams.get("n_layer")))
         tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)

TheBloke · 2023-11-23T08:50:58Z

Thanks both

TheBloke added the bug-unconfirmed label Nov 22, 2023

Galunid mentioned this issue Nov 22, 2023

convert : fix tensors using grad in some models #4173

Merged

Galunid closed this as completed in #4173 Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting a StableLM fine tuned model fails with `Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.` #4171

Converting a StableLM fine tuned model fails with `Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.` #4171

TheBloke commented Nov 22, 2023

TheBloke commented Nov 22, 2023 •

edited

Loading

KerfuffleV2 commented Nov 22, 2023

Galunid commented Nov 22, 2023

TheBloke commented Nov 23, 2023

Converting a StableLM fine tuned model fails with Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. #4171

Converting a StableLM fine tuned model fails with Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. #4171

Comments

TheBloke commented Nov 22, 2023

Prerequisites

Current Behavior

Environment and Context

TheBloke commented Nov 22, 2023 • edited Loading

KerfuffleV2 commented Nov 22, 2023

Galunid commented Nov 22, 2023

TheBloke commented Nov 23, 2023

Converting a StableLM fine tuned model fails with `Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.` #4171

Converting a StableLM fine tuned model fails with `Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.` #4171

TheBloke commented Nov 22, 2023 •

edited

Loading