-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move some lifecycle management from doTask -> shutdown for the mm-less task runner #14895
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left a minor comment.
@georgew5656 , what would be the side effect of this if the TaskQueue
is hypothetically slow in cleaning up finished tasks?
@@ -188,7 +186,7 @@ protected synchronized TaskStatus join(long timeout) throws IllegalStateExceptio | |||
*/ | |||
protected void shutdown() | |||
{ | |||
if (State.PENDING.equals(state.get()) || State.RUNNING.equals(state.get())) { | |||
if (State.PENDING.equals(state.get()) || State.RUNNING.equals(state.get()) || State.STOPPED.equals(state.get())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to handle STOPPED state here? Wouldn't the job have already finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is saying thats its okay to shutdown if its stopped
The two things that will hang around for a while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup, left one minor comment
@@ -271,6 +262,10 @@ public void shutdown(String taskid, String reason) | |||
return; | |||
} | |||
|
|||
synchronized (tasks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the code can be simplified:
KubernetesWorkItem workItem;
synchronized (tasks) {
workItem = tasks.remove(taskid);
}
if (workItem == null) {
log.info("Ignoring request to cancel unknown task [%s]", taskid);
return;
}
Description
The mm-less task runner differs in behavior from the other runners that run on the overlord because it tries to handle the cleanup lifecycle on its own by immediately deleting k8s jobs and clearing it's tasks map as soon as it has finished running a task.
The HttpRemoteTaskRunner and RemoteTaskRunner don't do this. Instead, they rely on the TaskQueue's handlers for the futures that all the task runners return to call shutdown on the task once it has completed.
Updating the mm-less task runner to use logic more similar to the other task runners has a couple benefits.
Changes
Release note
Update mm-less task runner lifecycle logic to better match the logic in the HTTP and Zookeeper worker task runners.
Key changed/added classes in this PR
KubernetesPeonLifecycle
KubernetesTaskRunner
KubernetesWorkItem
This PR has: