-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor setup_training and remove test_mode #5388
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5388 +/- ##
=======================================
- Coverage 93% 93% -0%
=======================================
Files 135 135
Lines 10015 9863 -152
=======================================
- Hits 9338 9173 -165
- Misses 677 690 +13 |
Hello @rohitgr7! Thanks for updating this PR.
Comment last updated at 2021-01-08 20:02:13 UTC |
@awaelchli I see. Will remove other changes in the accelerators except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work ! Could you check self.logger_connector.set_stage
. It is duplicating the information.
Would be better to have a higher level State for this.
self.logger_connector.set_stage
was introduced as validation can be performed during training and need to change store.
And it would be good to have trainer.training = True
argument and not rely on the model for it.
This reverts commit 5d9e95f.
@Borda need help with this |
@SeanNaren @tchaton mind review?? |
@@ -412,6 +412,46 @@ def __init__( | |||
# Callback system | |||
self.on_init_end() | |||
|
|||
def setup_trainer(self, model: LightningModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this refactor broke this functionality:
trainer = Trainer(resume_from_checkpoint=ckpt)
trainer.test()
I don't see where self.trainer.checkpoint_connector.restore_weights()
is called in the evaluation loop or in the trainer. before this happened as part of the TrainLoop's setup_training
In Trainer
's run_evaluation
I don't see any calls to the checkpoint connector either. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done in setup_training
here: https://github.com/PyTorchLightning/pytorch-lightning/pull/5388/files#diff-6b21474ed45079f01dfa45ee3b9d40d23efe693c9c005ce897e25384ef425349
as of now restore is done while calling .fit
. Previously it was happening during .test()
too and was reloading the resume_from_checkpoint
state and model weights that were wrong. But for your use-case, I opened an issue for reloading the callback states optionally.
* ref and fix call for on_pretrained_routine * avoid failing tests * unnecessary_call * unnecessary call in accelerators * tmpdir * rm test_mode * pep * updates * more ref * Revert "more ref" This reverts commit 5d9e95f. * more refac Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
What does this PR do?
Clean up and split
setup_training
intosetup_trainer
&setup_training
to avoid call toon_pretrain_routine_*
while testing. Removed redundanttest_mode
, usetrainer.testing
instead. Also a few minor updates, to make sure tests passes locally.Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃