Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queue: Ability to send different signals to running tasks #8624

Closed
aschuh-hf opened this issue Nov 24, 2022 · 7 comments · Fixed by #8657
Closed

queue: Ability to send different signals to running tasks #8624

aschuh-hf opened this issue Nov 24, 2022 · 7 comments · Fixed by #8657
Labels
A: task-queue Related to task queue. feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@aschuh-hf
Copy link

aschuh-hf commented Nov 24, 2022

dvc queue kill can be used to terminate running tasks. One can also use dvc queue logs --follow to attach to the output of a running task, but CTRL+C (SIGINT) will only detach but not forward the interrupt signal to the running task (which is good). There is no option at the moment to gracefully interrupt a running task, e.g., a training stage using DVC checkpoints as one can and is being used in the user guide for DVC Experiments with checkpoints. Options for dvc queue kill which enable the user to send different signals (SIGTERM, SIGINT) to the running task similar to the Unix kill command would be useful.

Related, sending SIGINT via CTRL+C would be possible if there was a command to attach to a running task such that the signal is indeed forwarded to the task rather than handled by, e.g., dvc queue logs --follow. Possibly similar to docker attach, where one can declare --detach-keys to detach without interrupting the task. A key combination different from CTRL+C could also have been used for dvc queue logs --follow to unfollow / detach again without interruption (see also tmux) but CTRL+C would have been forwarded to the task still. This different behavior could be covered by a new dvc queue attach command (which could make dvc queue logs --follow obsolete actually).

@efiop
Copy link
Contributor

efiop commented Nov 24, 2022

@karajan1001 @pmrowla Would something like dvc queue kill --signal SIGINT make sense? WDYT?

@efiop efiop added A: experiments Related to dvc exp A: task-queue Related to task queue. feature request Requesting a new feature and removed A: experiments Related to dvc exp labels Nov 24, 2022
@pmrowla
Copy link
Contributor

pmrowla commented Nov 25, 2022

Yes, sending sigint would work

@karajan1001 karajan1001 self-assigned this Nov 25, 2022
@karajan1001 karajan1001 added this to DVC Nov 25, 2022
@karajan1001 karajan1001 moved this to Backlog in DVC Nov 25, 2022
@karajan1001 karajan1001 moved this from Backlog to Todo in DVC Nov 25, 2022
@karajan1001 karajan1001 removed their assignment Nov 25, 2022
@karajan1001 karajan1001 moved this from Todo to Backlog in DVC Nov 25, 2022
@dberenbaum
Copy link
Collaborator

Why is sigkill the default? Should we default to sigint?

@karajan1001
Copy link
Contributor

karajan1001 commented Nov 30, 2022

Why is sigkill the default? Should we default to sigint?

Sometimes, Ctrl-C does not stop the process while kill -9 always does it. Maybe we should add a --force flag to queue kill and make the default queue kill sending SIGINT and queue kill --force to send SIGKILL

@karajan1001
Copy link
Contributor

@pmrowla , @dberenbaum
A problem is that the training process is running in a sub-process of the DVC command. Sending SIGINT to the DVC command couldn't stop the sub-process of the training (But SIGKILL can). This ticket is more complex than I used to believe. If we want to implement this we need to first record the sub-process id in some way and send the signal to that process.

@pmrowla
Copy link
Contributor

pmrowla commented Dec 2, 2022

Discussed this with @karajan1001, looks like there may be an issue with our SIGINT handling in dvc exp exec-run (which is used internally to run the exps from celery), so that's where we'll need to start for this issue. It should not be necessary for us to keep track of each child pipeline stage process ID.

@dberenbaum
Copy link
Collaborator

How important is this? It seems like a new feature request that may not be urgent.

Issues like #8612 should work even with SIGKILL. Can we hold off on this until we address the reported regressions and bugs?

karajan1001 added a commit to karajan1001/dvc that referenced this issue Dec 5, 2022
fix: iterative#8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Remove `SIGINT` blocking.
4. Add tests for `queue kill`
karajan1001 added a commit to karajan1001/dvc that referenced this issue Dec 8, 2022
fix: iterative#8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Remove `SIGINT` blocking.
4. Add tests for `queue kill`
5. bump into dvc-task 0.1.7
karajan1001 added a commit to karajan1001/dvc that referenced this issue Dec 8, 2022
fix: iterative#8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Add tests for `queue kill`
4. bump into dvc-task 0.1.7
karajan1001 added a commit to karajan1001/dvc that referenced this issue Dec 10, 2022
fix: iterative#8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Add tests for `queue kill`
4. bump into dvc-task 0.1.8
@dberenbaum dberenbaum moved this from Backlog to Review In Progress in DVC Dec 13, 2022
karajan1001 added a commit to karajan1001/dvc that referenced this issue Dec 15, 2022
fix: iterative#8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Add tests for `queue kill`
@dberenbaum dberenbaum added the p2-medium Medium priority, should be done, but less important label Dec 20, 2022
karajan1001 added a commit to karajan1001/dvc that referenced this issue Dec 30, 2022
fix: iterative#8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Add tests for `queue kill`
4. Bump dvc-task into 0.1.9
karajan1001 added a commit that referenced this issue Dec 30, 2022
fix: #8624
1. Add a new flag `--force` for `queue kill`
2. Make `SIGINT` to be the default option and `SIGKILL` to be with
   `--force`
3. Add tests for `queue kill`
4. Bump dvc-task into 0.1.9
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in DVC Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: task-queue Related to task queue. feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants