Refactor callbacks #776

kuynzereb · 2020-01-31T10:40:39Z

Some refactoring of callbacks is done in this PR:

Added on_validation_begin() and on_valdiation_end() to callbacks. The point is that ModelCheckpoint uses on_epoch_end() but actually is called in the validation end. It is fixed.
All callbacks calls are unified and take no additional arguments. That is, from now on it is on_epoch_end() instead of on_epoch_end(epoch, logs). Instead of these additional arguments now callback will have a link to the trainer, so it will have access to the current_epoch, global_step, callback_metrics and so on. We just need to call self.callback.set_trainer(self) while initializing callbacks in the trainer.

With these modifications it will be easy to implement such things as additional checkpointing callback which uses on_epoch_end instead of on_validation_end and so can be used to checkpointing training with no validation loops (#596, #652). Also it will be easy to start using global_step in checkpoints name. And in general all callbacks will be more unified.

williamFalcon · 2020-02-01T20:55:23Z

@Borda @neggert good to go?

Borda · 2020-02-01T21:29:41Z

it is quite extensive change, ill have look tomorrow...

pytorch_lightning/callbacks/pt_callbacks.py

hadim · 2020-02-11T11:24:11Z

I am planning to move the progress bar to a callback and I'll need this PR. Here are a few suggestions.

could we have a test_start and test_end?
could we move each callbacks in a separate file so maintenance and history are easier?
Could we unify how callbacks are passed to the trainer? It would be useful to generalize and pass a list of callbacks instead of individual callbacks. During each event, the trainer should call all the callbacks. Free to the callback to add the necessary logic to do something or not.

Let me know what do you think.

Borda · 2020-02-11T11:56:13Z

I am planning to move the progress bar to a callback and I'll need this PR. Here are a few suggestions.

could we have a test_start and test_end?

Yes

could we move each callbacks in a separate file so maintenance and history are easier?

Good one :]

Could we unify how callbacks are passed to the trainer? It would be useful to generalize and pass a list of callbacks instead of individual callbacks. During each event, the trainer should call all the callbacks. Free to the callback to add the necessary logic to do something or not.

That would be nice...

Let me know what do you think.

@williamFalcon ^^

kuynzereb · 2020-02-11T15:13:51Z

I can make first 2 suggestions. And I totally agree with the third suggestion and I thought about it myself. But I am afraid it is not very straightforward with the current implementation of callbacks and now I don't have enough time to delve into it. So I would like to implement only the first two suggestions in this PR.

hadim · 2020-02-11T15:15:39Z

That would be great. I make no promise but I can try to tackle the third one.

Borda · 2020-02-11T15:26:32Z

@kuynzereb @hadim I think that it would be much easier to pass if each suggestion will be single PR... the large/complex PRs are not so nice for review (takes much longer to check) nor the author (debugging may become quite complex) :]

kuynzereb · 2020-02-11T15:32:26Z

@Borda it also sounds very reasonable :)

hadim · 2020-02-11T17:54:44Z

No problem to make PRs but something like test_start and test_end could easily be added to this one without too much burden I think.

Borda

Really valuable contribution, just a few (rather formatting) comments

Borda · 2020-02-13T21:55:57Z

pytorch_lightning/callbacks/pt_callbacks.py


 class Callback(object):
-    r"""Abstract base class used to build new callbacks.
-    """
+    """Abstract base class used to build new callbacks."""


as abstract, inherit from abstract ABC... class Callback(ABC): ?
but if it will be ABC, then you have to implement all methods all the time... :/

I am not familiar with all this ABC stuff, so I don't really know :)
But anyway I think it would be better to do it in another PR (if it really should be done)

fyi: https://www.python-course.eu/python3_abstract_classes.php

Borda · 2020-02-13T22:01:58Z

pytorch_lightning/callbacks/pt_callbacks.py

        pass


+_no_trainer_error_msg = ".set_trainer() should be called after the callback initialization"


_NO_TRAINER_ERROR_MSG = "Missing trainer instance. The `.set_trainer(...)` should be called after the callback initialization."

Borda · 2020-02-13T22:07:12Z

pytorch_lightning/callbacks/pt_callbacks.py

        # Allow instances to be re-used
        self.wait = 0
        self.stopped_epoch = 0
        self.best = np.Inf if self.monitor_op == np.less else -np.Inf

-    def on_epoch_end(self, epoch, logs=None):
+    def on_epoch_end(self):
+        assert self._trainer is not None, _no_trainer_error_msg


I would move this assert to a parent class and just at the beginning call parent method ?

I am not sure about this. Theoretically there may be some calls or even entire callbacks which don't use trainer at all. So there will be no need in this assert.

true... lets keep it for now

Borda · 2020-02-13T22:17:10Z

tests/test_trainer.py

@@ -231,8 +231,12 @@ def mock_save_function(filepath):
    # CASE K=-1  (all)
    w = ModelCheckpoint(save_dir, save_top_k=-1, verbose=1)
    w.save_function = mock_save_function
+    trainer = Trainer()
+    w.set_trainer(trainer)


use longer var name then one letter

Borda · 2020-02-13T22:18:11Z

tests/test_trainer.py

    for i, loss in enumerate(losses):
-        w.on_epoch_end(i, logs={'val_loss': loss})
+        w._trainer.current_epoch = i
+        w._trainer.callback_metrics = {'val_loss': loss}


add a comment that this is kind of hack to simulate training...

Borda · 2020-02-13T22:20:27Z

@kuynzereb may you push a commit so it triggers new GitHub CI

Borda · 2020-02-14T08:00:20Z

Great work, thx

Borda · 2020-02-14T08:08:46Z

@williamFalcon it was nice to see that GH actions were about twice faster than Travis lol

williamFalcon · 2020-02-16T05:04:20Z

@Borda can you close the appropriate tickets related to this PR?

Borda · 2020-02-16T08:13:59Z

I don't see any particular issue for this, but I will check the backlog later... @kuynzereb was this change requested in a issue?

kuynzereb · 2020-02-16T08:50:08Z

@kuynzereb was this change requested in a issue?

Nope, it was not

jeremyjordan · 2020-02-16T15:40:55Z

@kuynzereb you mentioned:

All callbacks calls are unified and take no additional arguments. That is, from now on it is on_epoch_end() instead of on_epoch_end(epoch, logs). Instead of these additional arguments now callback will have a link to the trainer, so it will have access to the current_epoch, global_step, callback_metrics and so on.

I'm wondering, was this on_batch_start intended to still have a batch argument?

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/training_loop.py#L462

kuynzereb · 2020-02-16T16:59:37Z

@jeremyjordan, well, it is not a callback, but a model hook. And actually I didn't think about them while doing this PR. But yes, it seems that the concept of model hooks is very similar to the concept of callbacks. So maybe it should be unified as well.

jeremyjordan · 2020-02-16T17:05:37Z

ah yes, that's right. i was going through training_loop.py quickly searching for on_* methods to fix a merge conflict and didn't stop to think about the difference between the two.

Refactor callbacks

b7be7ee

kuynzereb requested review from Borda and williamFalcon January 31, 2020 10:41

flake8

51f2146

Borda added feature Is an improvement or enhancement help wanted Open to be worked on labels Jan 31, 2020

Borda added this to the 0.6.1 milestone Jan 31, 2020

Borda requested changes Feb 1, 2020

View reviewed changes

kuynzereb added 7 commits February 2, 2020 19:22

Update docstrings

8e29b9e

Simplified callback, protected trainer

90bbc81

.set_trainer() check

d361079

update docs

ca65d57

missed super().__ini__()

1f4a221

Updated tests

bcc0fe0

Merge branch 'master' into new_callback_entry_points

1b101d8

hadim mentioned this pull request Feb 10, 2020

Improve tqdm progress bar #765

Closed

Merge branch 'master' into new_callback_entry_points

4aa440c

Borda approved these changes Feb 13, 2020

View reviewed changes

kuynzereb added 2 commits February 14, 2020 10:33

Use uppercase

e8b0dfa

refine checkpoint callback tests

0abdfff

Added test_begin() and test_end()

97bb758

Borda added the ready PRs ready to be merged label Feb 14, 2020

Borda mentioned this pull request Feb 14, 2020

Allow user to specify 'step' key while logging metrics #808

Merged

hadim mentioned this pull request Feb 15, 2020

Split callbacks #849

Merged

kuynzereb mentioned this pull request Feb 15, 2020

Model checkpointing to just save the latest model #851

Closed

williamFalcon merged commit edd4a87 into Lightning-AI:master Feb 16, 2020

kuynzereb deleted the new_callback_entry_points branch February 16, 2020 08:50

Borda mentioned this pull request Feb 18, 2020

Fixes resuming checkpoints rerunning last epoch #866

Merged

Borda mentioned this pull request Mar 4, 2020

Training metrics #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor callbacks #776

Refactor callbacks #776

kuynzereb commented Jan 31, 2020

williamFalcon commented Feb 1, 2020

Borda commented Feb 1, 2020

hadim commented Feb 11, 2020

Borda commented Feb 11, 2020

kuynzereb commented Feb 11, 2020

hadim commented Feb 11, 2020

Borda commented Feb 11, 2020

kuynzereb commented Feb 11, 2020

hadim commented Feb 11, 2020

Borda left a comment

Borda Feb 13, 2020

kuynzereb Feb 14, 2020

Borda Feb 14, 2020

Borda Feb 13, 2020

kuynzereb Feb 14, 2020

Borda Feb 13, 2020

kuynzereb Feb 14, 2020

Borda Feb 14, 2020

Borda Feb 13, 2020

kuynzereb Feb 14, 2020

Borda Feb 13, 2020

kuynzereb Feb 14, 2020

Borda commented Feb 13, 2020

Borda commented Feb 14, 2020

Borda commented Feb 14, 2020 •

edited

Loading

williamFalcon commented Feb 16, 2020

Borda commented Feb 16, 2020

kuynzereb commented Feb 16, 2020

jeremyjordan commented Feb 16, 2020

kuynzereb commented Feb 16, 2020

jeremyjordan commented Feb 16, 2020

		pass


		_no_trainer_error_msg = ".set_trainer() should be called after the callback initialization"

Refactor callbacks #776

Refactor callbacks #776

Conversation

kuynzereb commented Jan 31, 2020

williamFalcon commented Feb 1, 2020

Borda commented Feb 1, 2020

hadim commented Feb 11, 2020

Borda commented Feb 11, 2020

kuynzereb commented Feb 11, 2020

hadim commented Feb 11, 2020

Borda commented Feb 11, 2020

kuynzereb commented Feb 11, 2020

hadim commented Feb 11, 2020

Borda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda commented Feb 13, 2020

Borda commented Feb 14, 2020

Borda commented Feb 14, 2020 • edited Loading

williamFalcon commented Feb 16, 2020

Borda commented Feb 16, 2020

kuynzereb commented Feb 16, 2020

jeremyjordan commented Feb 16, 2020

kuynzereb commented Feb 16, 2020

jeremyjordan commented Feb 16, 2020

Borda commented Feb 14, 2020 •

edited

Loading