Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of KeyboardInterrupt #9286

Closed
mibaumgartner opened this issue Sep 2, 2021 · 15 comments
Closed

Handling of KeyboardInterrupt #9286

mibaumgartner opened this issue Sep 2, 2021 · 15 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@mibaumgartner
Copy link
Contributor

🐛 Bug

Same behavior as in #6807

May be caused due to the wrong indentation of the raise statement?
https://github.com/PyTorchLightning/pytorch-lightning/blob/75350938ca646efc0b4bac432ba2d5d4676662bb/pytorch_lightning/trainer/trainer.py#L524

To Reproduce

Run training and raise a Keyboard Interrupt.

Expected behavior

Training should end and KeyboardInterrupt should stop the whole program.

Environment

  • PyTorch Lightning Version (e.g., 1.3.0):
  • PyTorch Version (e.g., 1.8)
  • Python version:
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

@mibaumgartner mibaumgartner added bug Something isn't working help wanted Open to be worked on labels Sep 2, 2021
@tchaton
Copy link
Contributor

tchaton commented Sep 2, 2021

@daniellepintz

@daniellepintz
Copy link
Contributor

Thanks for reporting @mibaumgartner I am looking into it!

@daniellepintz
Copy link
Contributor

daniellepintz commented Sep 2, 2021

May be caused due to the wrong indentation of the raise statement?

What do you mean exactly? I don't see any wrong indentation of the raise statement

@mibaumgartner
Copy link
Contributor Author

Hi @daniellepintz, thanks for looking into this.
I thought about moving the raise statement one indentation to the left but it might be better (more explicit) to add a second raise statement in the keyboard interrupt block.

@daniellepintz
Copy link
Contributor

@mibaumgartner Done in #9260

@ananthsub
Copy link
Contributor

@tchaton @daniellepintz @awaelchli was it intentional that keyboard interrupt didn't re-raise the exception after doing the graceful teardown from #856 ? is that desirable?

@awaelchli
Copy link
Contributor

yes certainly, ctrl+c -> graceful shutdown -> exit training without leaving behind a stack trace. that's simply for UX

@ananthsub
Copy link
Contributor

yes certainly, ctrl+c -> graceful shutdown -> exit training without leaving behind a stack trace. that's simply for UX

Was there concern about ghost background processes in this case? Should this apply to all training types, or only for single device ones?

@awaelchli
Copy link
Contributor

awaelchli commented Sep 3, 2021

@daniellepintz interrupt / exception handling changes are not included in any release are they?
@mibaumgartner we are kindly requesting at least the PL version you were running this with please. KeyboardInterrupts work fine on master branch given my quick testing. more details would be appreciated to address your specific issue.

@daniellepintz
Copy link
Contributor

yes that is correct, all of my recent changes are still unreleased

@mibaumgartner
Copy link
Contributor Author

mibaumgartner commented Sep 3, 2021

Hi @awaelchli,

It occurred with several (recent) versions of lightning, including the latest 1.4.5 release.

The Issue can be reproduced by raising a Keyboard Interrupt in the Lightning Module (could be a good test :) ):
https://colab.research.google.com/drive/1aSoLeHHJCyKCfxpm-YsdJ3Gq6YLNR2Ci?usp=sharing

Changes in notebook:

    def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)

        if batch_idx > 200: # see training starting, not actually needed
          raise KeyboardInterrupt

        return {"loss": loss}
    # Train the model ⚡
    trainer.fit(model, train, val)

    print("don't execute this since Keyboard Interrupt was raised in training")

    trainer.test(test_dataloaders=test)

(In the example notebook trainer.test also results in an additional error since the checkpoint from the training is missing)

@daniellepintz
Copy link
Contributor

@Borda sorry for the confusion but I don't think #9260 quite resolves this (my bad for saying it did in the PR description)

even though we just deprecated on_keyboard_interrupt what @mibaumgartner is referring to will still remain.

current behavior is when there is a KeyboardInterrupt it gracefully exits the current trainer function , not the whole program. but when there is an exception, it raises the exception which exits the whole program. @awaelchli is this intended behavior?

@awaelchli
Copy link
Contributor

awaelchli commented Sep 6, 2021

Yes it will terminate only the current fit function. This graceful shutdown mechanism was introduced before Lightning v1.0, so it is standard behavior for a long time now.

To stop program execution, we can raise a SystemExit

@daniellepintz
Copy link
Contributor

sounds good thanks @awaelchli . @mibaumgartner does this behavior work for you?

@mibaumgartner
Copy link
Contributor Author

Yes, sounds good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants