-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating 1.0.3 -> 1.0.4 breaks pytorch module hooks #5131
Comments
@ludwigwinkler i suspect using the boring model with an optimizer that overrides a hook will replicate this problem |
Hi, this is a boring model with an Optimizer that uses forward and backward hooks to do stuff with parameters Let me know if something is unclear/needs more comments or needs polishing. Thanks for all the effort you put into this project. =) |
@tchaton Did you have time to look at this? |
I had the same bug, but it turns out to be: when retiring, pytorch_lightning.core.optimizer.LightningOptimizer calls its From pytorch lightning's stand point, it is very horrifying that it automatically calls my optimizer's |
@lizhitwo thank you for your insight! |
There’s no explicit calling but you can try it by constructing a lightning optimizer that wraps KFAC, and then set it to None while retaining KFAC, and then call garbage collector (gc.collect()). I think you should then see KFAC del called although you still have a KFAC object. On Fri, Jan 22, 2021 at 9:12 AM, ludwigwinkler <[email protected]> wrote:
@lizhitwo thank you for your insight!
where can I find the deletion call in the source code?
Or should I overwrite my custom optimizers del function?
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Dear @ludwigwinkler @lizhitwo, Great find ! We didn't thought about I couldn't reproduce the bug you shared. The notebook contained an error: The provided closure wasn't being called, so the hooks weren't triggered. I have made a PR there: #6305. Please, check it does cover your use case. Sorry for the inconvenience. Best, |
Sorry for the long delay. So I revisited my old code but the deletion of the hooks still persists in version 1.2.6. :( You can check the colab boring model here: The optimizer's internal state wont be able to reference the results of the hooks which are stored in the internal dictionary. |
Hi ludwigwinkler, from the stack it seems that your code runs the optimizer step before the closure ( To pl devs: Is it also a bug that in this tutorial the closure is only evaluated in some iterations, but just ignored (no forward passes) in others? |
Hi lizhitwo, Do you mean that the forward pass should be a encapsulated in a closure and subsequently passed to the optimizer such that the forward pass itself is executed within the optimizer? Thanks for your effort to help me and everybody else out on this, |
yeah outside pytorch lightning, closure is only useful in LBFGS. In pl, everything uses closure whose purpose I think is to support all optimizers at once, including LBFGS.
that’s already done automatically in pl. Inside your optimizer step, you can try to print the input closure variable, which will be a function instead of none. You can add printing to your training step and your optimizer step to observe this behavior. You should be able to see that the training step was not run if you don’t call closure, but if you call it ( |
Wow ... never would have that that simply adding Wonderful! Thank you very much @lizhitwo for your help! |
Glad to help someone who fell into the same pit as myself. Frankly I think pytorch lightning should throw an warning if |
1.0.3 -> 1.0.4 breaks pytorch-hooks
What is your question?
I implemented a custom optimizer that stores the input and the incoming gradients for each layer and recomputes the gradients in a per-sample fashion (wich I do because I want to use K-FAC for my optimization).
I'm using the pytorch hooks on modules (i.e. module123.register_forward_pre_hook(_save_input_function) which will store the inputs into the state dictionary of the optimizer.
I use the same mod123.register_backward_hook(_save_gradient_function) for the backward gradients) and when calling optimizer.step().
As I updated from 1.0.1 to 1.1.0 it broke the hooks that are required in the KFAC optimizer.
I checked different versions and from 1.0.3 to 1.0.4 something changed such that the Trainer ignores/steps over the hooks in the forward and backward pass.
I checked the release notes but couldn't identify anything essential that would warrant these breaking changes.
Additionally they were issue popping up with manually modifying gradients in other sampling-based types of optimizers.
These issue do not exist for the standard optimizers straight from PyTorch optim class.
Code
It would be a huge codebase as the implementation of KFAC is a bit sophisticated.
But if nothing helps I could write an minimal working example from scratch.
The text was updated successfully, but these errors were encountered: