Refactor setup_training and remove test_mode #5388

rohitgr7 · 2021-01-06T21:20:21Z

What does this PR do?

Clean up and split setup_training into setup_trainer & setup_training to avoid call to on_pretrain_routine_* while testing. Removed redundant test_mode, use trainer.testing instead. Also a few minor updates, to make sure tests passes locally.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Check that target branch and milestone match!

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2021-01-06T22:03:46Z

Codecov Report

Merging #5388 (2e65f6a) into master (83b1ff4) will decrease coverage by 0%.
The diff coverage is 98%.

@@           Coverage Diff           @@
##           master   #5388    +/-   ##
=======================================
- Coverage      93%     93%    -0%     
=======================================
  Files         135     135            
  Lines       10015    9863   -152     
=======================================
- Hits         9338    9173   -165     
- Misses        677     690    +13

pep8speaks · 2021-01-07T19:47:33Z

Hello @rohitgr7! Thanks for updating this PR.

In the file pytorch_lightning/trainer/evaluation_loop.py:

Line 148:13: W503 line break before binary operator
Line 152:13: W503 line break before binary operator

Comment last updated at 2021-01-08 20:02:13 UTC

pytorch_lightning/accelerators/cpu_accelerator.py

tests/models/test_hooks.py

pytorch_lightning/trainer/connectors/checkpoint_connector.py

pytorch_lightning/trainer/evaluation_loop.py

pytorch_lightning/trainer/trainer.py

awaelchli · 2021-01-08T00:04:47Z

@rohitgr7 regarding accelerators, please have a look at #4510 #5385. we are working on some large scale refactoring.

rohitgr7 · 2021-01-08T06:31:26Z

@awaelchli I see. Will remove other changes in the accelerators except setup_training -> setup_trainer. Will that be ok?

tchaton

Great work ! Could you check self.logger_connector.set_stage. It is duplicating the information.
Would be better to have a higher level State for this.
self.logger_connector.set_stage was introduced as validation can be performed during training and need to change store.

And it would be good to have trainer.training = True argument and not rely on the model for it.

pytorch_lightning/trainer/trainer.py

pytorch_lightning/trainer/training_loop.py

pytorch_lightning/accelerators/accelerator.py

pytorch_lightning/trainer/trainer.py

This reverts commit 5d9e95f.

rohitgr7 · 2021-01-08T20:07:02Z

add internal phase for the trainer as enum for Train/Valid/Test as we seem to have LoggerStages

@Borda need help with this

Borda · 2021-01-08T22:52:43Z

add internal phase for the trainer as enum for Train/Valid/Test as we seem to have LoggerStages

@Borda need help with this

I made a draft for it here #5419 🐰

rohitgr7 · 2021-01-13T07:59:06Z

@SeanNaren @tchaton mind review??

ananthsub · 2021-01-24T01:46:59Z

pytorch_lightning/trainer/trainer.py

@@ -412,6 +412,46 @@ def __init__(
        # Callback system
        self.on_init_end()

+    def setup_trainer(self, model: LightningModule):


I think this refactor broke this functionality:

trainer = Trainer(resume_from_checkpoint=ckpt) trainer.test()

I don't see where self.trainer.checkpoint_connector.restore_weights() is called in the evaluation loop or in the trainer. before this happened as part of the TrainLoop's setup_training

In Trainer's run_evaluation I don't see any calls to the checkpoint connector either. Is that intentional?

this is done in setup_training here: https://github.com/PyTorchLightning/pytorch-lightning/pull/5388/files#diff-6b21474ed45079f01dfa45ee3b9d40d23efe693c9c005ce897e25384ef425349

as of now restore is done while calling .fit. Previously it was happening during .test() too and was reloading the resume_from_checkpoint state and model weights that were wrong. But for your use-case, I opened an issue for reloading the callback states optionally.

* ref and fix call for on_pretrained_routine * avoid failing tests * unnecessary_call * unnecessary call in accelerators * tmpdir * rm test_mode * pep * updates * more ref * Revert "more ref" This reverts commit 5d9e95f. * more refac Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

rohitgr7 added 2 commits January 7, 2021 01:22

ref and fix call for on_pretrained_routine

b490fe7

avoid failing tests

d6b9f27

rohitgr7 added 5 commits January 8, 2021 00:13

unnecessary_call

f2cbff0

unnecessary call in accelerators

9753d2f

tmpdir

6e0d63a

rm test_mode

64a3e7c

Merge branch 'master' into ref/setup_training

848a653

pep

5d39e21

rohitgr7 marked this pull request as ready for review January 7, 2021 20:30

rohitgr7 requested review from awaelchli, Borda, justusschock, SeanNaren, tchaton and williamFalcon as code owners January 7, 2021 20:30

rohitgr7 changed the title ~~Refactor setup_training~~ Refactor setup_training and remove test_mode Jan 7, 2021

rohitgr7 added refactor bug Something isn't working labels Jan 7, 2021

rohitgr7 added this to the 1.1.x milestone Jan 7, 2021

carmocca reviewed Jan 7, 2021

View reviewed changes

pytorch_lightning/accelerators/cpu_accelerator.py Outdated Show resolved Hide resolved

carmocca reviewed Jan 7, 2021

View reviewed changes

pytorch_lightning/accelerators/cpu_accelerator.py Outdated Show resolved Hide resolved

carmocca reviewed Jan 7, 2021

View reviewed changes

tests/models/test_hooks.py Show resolved Hide resolved

carmocca reviewed Jan 7, 2021

View reviewed changes

pytorch_lightning/trainer/connectors/checkpoint_connector.py Show resolved Hide resolved

Borda reviewed Jan 7, 2021

View reviewed changes

pytorch_lightning/trainer/evaluation_loop.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

Borda added design Includes a design discussion checkpointing Related to checkpointing feature Is an improvement or enhancement and removed checkpointing Related to checkpointing labels Jan 7, 2021

more ref

5d9e95f

carmocca approved these changes Jan 7, 2021

View reviewed changes

tchaton reviewed Jan 8, 2021

View reviewed changes

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved

pytorch_lightning/accelerators/accelerator.py Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

rohitgr7 added 3 commits January 9, 2021 01:30

Merge branch 'master' into ref/setup_training

a932944

Revert "more ref"

b8554a1

This reverts commit 5d9e95f.

more refac

469fd00

rohitgr7 requested a review from Borda January 8, 2021 20:07

s-rog approved these changes Jan 11, 2021

View reviewed changes

Borda added the ready PRs ready to be merged label Jan 11, 2021

justusschock approved these changes Jan 11, 2021

View reviewed changes

awaelchli mentioned this pull request Jan 11, 2021

WIP: Accelerator refactor #5385

Closed

9 tasks

SeanNaren approved these changes Jan 13, 2021

View reviewed changes

tchaton approved these changes Jan 13, 2021

View reviewed changes

Branch was auto-updated.

2e65f6a

rohitgr7 enabled auto-merge (squash) January 13, 2021 19:48

rohitgr7 merged commit d916973 into master Jan 13, 2021

rohitgr7 deleted the ref/setup_training branch January 13, 2021 20:30

justusschock mentioned this pull request Jan 22, 2021

PoC: Accelerator refactor [wip] [skip ci] #5616

Closed

16 tasks

ananthsub reviewed Jan 24, 2021

View reviewed changes

ananthsub mentioned this pull request Jan 24, 2021

Load callback states while testing. #5542

Open

carmocca mentioned this pull request Feb 1, 2021

Sync master/1.1.5 into release/1.2 [full merge, no squash] #5583

Merged

justusschock mentioned this pull request Feb 2, 2021

PoC: Accelerator refactor #5743

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor setup_training and remove test_mode #5388

Refactor setup_training and remove test_mode #5388

rohitgr7 commented Jan 6, 2021 •

edited

Loading

codecov bot commented Jan 6, 2021 •

edited

Loading

pep8speaks commented Jan 7, 2021 •

edited

Loading

awaelchli commented Jan 8, 2021

rohitgr7 commented Jan 8, 2021

tchaton left a comment •

edited

Loading

rohitgr7 commented Jan 8, 2021

Borda commented Jan 8, 2021

rohitgr7 commented Jan 13, 2021

ananthsub Jan 24, 2021 •

edited

Loading

rohitgr7 Jan 24, 2021

Refactor setup_training and remove test_mode #5388

Refactor setup_training and remove test_mode #5388

Conversation

rohitgr7 commented Jan 6, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Jan 6, 2021 • edited Loading

Codecov Report

pep8speaks commented Jan 7, 2021 • edited Loading

Comment last updated at 2021-01-08 20:02:13 UTC

awaelchli commented Jan 8, 2021

rohitgr7 commented Jan 8, 2021

tchaton left a comment • edited Loading

Choose a reason for hiding this comment

rohitgr7 commented Jan 8, 2021

Borda commented Jan 8, 2021

rohitgr7 commented Jan 13, 2021

ananthsub Jan 24, 2021 • edited Loading

Choose a reason for hiding this comment

rohitgr7 Jan 24, 2021

Choose a reason for hiding this comment

rohitgr7 commented Jan 6, 2021 •

edited

Loading

codecov bot commented Jan 6, 2021 •

edited

Loading

pep8speaks commented Jan 7, 2021 •

edited

Loading

tchaton left a comment •

edited

Loading

ananthsub Jan 24, 2021 •

edited

Loading