-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting a StableLM fine tuned model fails with Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
#4171
Comments
Actually, maybe this is a trivial fix! I followed the hint in the RuntimeError, and changed:
to:
And it produced a valid FP16 which is producing output. Just validating the output is definitely OK.. Yes it is. OK, this is a one line fix I guess. Happy to PR it myself, if someone could confirm there's no potential risks attached to adding |
https://pytorch.org/docs/stable/generated/torch.Tensor.detach.html It doesn't sound like there would be a problem with always detaching. |
I don't think there are any risks in adding data = data_torch.squeeze().detach().numpy() or data = data_torch.detatch().squeeze().numpy() This should work as temporary fix, although the proper solution would be to use diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 1105670..20ad4ed 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -51,6 +51,7 @@ class Model:
def set_vocab(self):
self._set_vocab_gpt2()
+ @torch.no_grad()
def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
for part_name in self.part_names:
print(f"gguf: loading model part '{part_name}'")
@@ -81,6 +82,7 @@ class Model:
self.gguf_writer.add_head_count(n_head)
self.gguf_writer.add_parallel_residual(self.hparams.get("use_parallel_residual", True))
+ @torch.no_grad()
def write_tensors(self):
block_count = self.hparams.get("n_layers", self.hparams.get("num_hidden_layers", self.hparams.get("n_layer")))
tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count) |
Thanks both |
Prerequisites
Tested on latest commit, 8e672ef , and also on commits from yesterday.
Current Behavior
Trying to convert model https://huggingface.co/pansophic/rocket-3B
Results in:
I noticed that the latest commits mentioned StaleLM so I tried rolling back to before them, but still got the same error.
I have confirmed that the model loads OK via Transformers, so it appears to be valid.
Any thoughts @Galunid ?
Thanks in advance
Environment and Context
Ubuntu 22.04, Python 3.10
The text was updated successfully, but these errors were encountered: