Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 966 #1008

Closed
wants to merge 3 commits into from
Closed

Issue 966 #1008

wants to merge 3 commits into from

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented May 4, 2020

Addresses #966 but not completely closes the FR

Description:

  • Added internal state for Checkpoint
  • Saving logic is changed

-- previously we

  1. save a checkpoint file
  2. updated internal _saved
  3. removed other files if more than n_saved

-- now we

  1. updated internal _saved
  2. removed other files if more than n_saved
  3. save a checkpoint file

This is due to the fact that we need to insert Checkpoint itself into user's provided to_save and save it in the training checkpoint.
When we setup the checkpoint, Checkpoint's state dict should be up to date as we add new item and remove optionally other items.

This does not work automatically for saving best models.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@vfdev-5 vfdev-5 closed this May 15, 2020
@vfdev-5 vfdev-5 deleted the issue-966 branch May 15, 2020 21:15
@amatsukawa
Copy link
Contributor

Hi @vfdev-5. I see that you've taken a crack at this but decided not to merge this in.

Related to this topic, can you confirm what guarantees, if any, ignite provides about the order in which handlers run? I believe right now, handlers for an event run in the order in which they were registered, because _fire_events just uses a for-loop.

This is relevant because eg. the EarlyStopping handler needs to have its state checkpointed to resume correctly. That means it needs to run before the Checkpoint handler.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Jun 21, 2020

Hi @amatsukawa , yes, I tried to implement it and stopped as automatic injection of Checkpoint's state can work only with Checkpoints added to a trainer etc. Some parts of this PR were required by other FR. So, maybe we can just add state_dict/load_state_dict methods and no automatic stuff for instance to avoid complex behaviour...

Related to this topic, can you confirm what guarantees, if any, ignite provides about the order in which handlers run? I believe right now, handlers for an event run in the order in which they were registered, because _fire_events just uses a for-loop.

Yes, that's correct. Handlers are triggered in the order they were registered. There is an issue about some internal priority reordering (#706), but it is on stand by...

This is relevant because eg. the EarlyStopping handler needs to have its state checkpointed to resume correctly. That means it needs to run before the Checkpoint handler.

EarlyStopping and Checkpoint for best models are attached to evaluators, so, probably, their states can not be automatically appended to training checkpoint instance. Yes, maybe, there should be a correct order to follow to achieve correct behaviour...

The reason why this PR was closed is related to the fact that auto stuff will not work everywhere and probably can be only done inside higher-api Trainers (which for instance are absent). On the other hand, explaning to user how to manually use this correctly to setup checkpointing of model, optimizer, trainer, checkpointer itself, checkpointer best model, early stopping and various metrics with state... it can become a nightmare :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants