-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/2] Deprecate outputs
in on_train_epoch_end
hooks
#7339
[1/2] Deprecate outputs
in on_train_epoch_end
hooks
#7339
Conversation
Hello @ananthsub! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-05-05 12:25:47 UTC |
Codecov Report
@@ Coverage Diff @@
## master #7339 +/- ##
=======================================
- Coverage 92% 87% -4%
=======================================
Files 200 200
Lines 12953 12985 +32
=======================================
- Hits 11883 11360 -523
- Misses 1070 1625 +555 |
on_train_epoch_end
on_train_epoch_end
hooks
on_train_epoch_end
hookson_train_epoch_end
hooks
on_train_epoch_end
hooksoutputs
in on_train_epoch_end
hooks
outputs
in on_train_epoch_end
hooksoutputs
in on_train_epoch_end
hooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 😃 minor queries
|
||
# if the PL module doesn't have the hook then call the accelerator | ||
# used to auto-reduce things for the user with Results obj | ||
elif hasattr(self.trainer.accelerator, hook_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan of this. Better to use call_hook
and maybe perform the signature analysis somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the comment, call_hook enforces that all of accelerator/trainer/module all take the exact same arguments for the hook, which might not be the case here. this was the same pattern @kaushikb11 followed in #6120
I'm not really a fan either, but call_hook
is calling over 3 distinct interfaces which aren't enforced to be compatible.
maybe this is something we can look at for v1.4 is how to make to simplify/strengthen this? maybe the techniques @SkafteNicki used for metrics collections could apply here, but that seems beyond the scope of this PR
one thing I can do is add comments to Trainer.call_hook to indicate that there's this override being applied in training loop and any changes to call_hook must also be applied here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ananthsub! I've strayed away from these hooks because of the caching logic and this is clearer
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
31314ee
to
d18455c
Compare
Co-authored-by: Jirka Borovec <[email protected]>
For callback implementers, if they need the outputs in the callback what do you suggest? cache through the batch_end callback methods? |
Yes, there are at least these options to support this:
the content of |
@ananthsub can you pls add an example to docs how to proper use cache in this case? |
on the docs site? in the callback/model hooks? |
where ever you feel it is better place :] |
@ananthsub Yes, I also see it that way and I think it's the most straightforward solution. Just wanted to make sure we know what to recommend when someone asks for this since we remove the feature that was requested. For everything we deprecate we should have a solution for people who rely on it. |
What does this PR do?
This addresses part of #6865
Traditionally, the differentiator between
LightningModule.training_epoch_end
vs theon_train_epoch_end
hook is that thetraining_epoch_end
received all the batch outputs for the epoch from that rank for post-processing.on_train_epoch_end
took no arguments and didn't dictate whether the trainer should cache these outputs.We deprecate
outputs
fromon_train_epoch_end
because:This PR checks these conditions for needing to store the per-batch results at the end of the epoch:
on_train_epoch_end
and includesoutputs
in its signature (until v1.5)The outputs were originally added here: #4369
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃