Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp: implement internal queue.kill support #7587

Closed
Tracked by #7592
pmrowla opened this issue Apr 19, 2022 · 2 comments
Closed
Tracked by #7592

exp: implement internal queue.kill support #7587

pmrowla opened this issue Apr 19, 2022 · 2 comments
Assignees
Labels
A: experiments Related to dvc exp

Comments

@pmrowla
Copy link
Contributor

pmrowla commented Apr 19, 2022

dvc-task/exp queue currently support experiments.queue.remove for removing inactive experiments. experiments.queue.kill API call for cancelling/terminating individual active experiments is still needed.

prerequisite for #7591

Job termination requires:

  • Terminate exp run subprocess via dvc-task proc.kill
  • Celery tasks related to the running experiment should not be revoked, they should run to completion
    • Once the exp run child process is terminated, it should be treated as any other failed run (and the relevant cleanup/partial collection celery tasks should still be scheduled and run)
@pmrowla pmrowla added the A: experiments Related to dvc exp label Apr 19, 2022
@pmrowla pmrowla mentioned this issue Apr 19, 2022
4 tasks
karajan1001 added a commit to karajan1001/dvc that referenced this issue May 9, 2022
fix: iterative#7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
@karajan1001 karajan1001 added this to DVC May 10, 2022
@karajan1001 karajan1001 moved this to Backlog in DVC May 10, 2022
@karajan1001 karajan1001 moved this from Backlog to Review In Progress in DVC May 10, 2022
karajan1001 added a commit to karajan1001/dvc that referenced this issue May 13, 2022
fix: iterative#7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
pmrowla pushed a commit that referenced this issue May 14, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
@karajan1001
Copy link
Contributor

There are still some related works

  1. unit test code coverage for the kill method.
  2. implement the kill flag in shutdown.

pmrowla pushed a commit that referenced this issue Jun 2, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
@karajan1001
Copy link
Contributor

close due to #7714 and #7799

Repository owner moved this from Review In Progress to Done in DVC Jun 7, 2022
dberenbaum pushed a commit that referenced this issue Jun 13, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
pmrowla pushed a commit that referenced this issue Jun 14, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
pmrowla pushed a commit that referenced this issue Jul 5, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
pmrowla pushed a commit that referenced this issue Jul 6, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
pmrowla pushed a commit that referenced this issue Jul 11, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
pmrowla pushed a commit that referenced this issue Jul 12, 2022
fix: #7587
1. Add implement kill method for local queue class
2. Add a test to make sure the following job will be success after the
   original job was killed.
3. Some refactoring work on `exp remove`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants