Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod_creation_wait_interal #2813

Merged
merged 3 commits into from
Nov 20, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion luigi/contrib/kubernetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ class kubernetes(luigi.Config):

class KubernetesJobTask(luigi.Task):
__DEFAULT_POLL_INTERVAL = 5 # see __track_job
__DEFAULT_POD_CREATION_INTERVAL = 5
_kubernetes_config = None # Needs to be loaded at runtime

def _init_kubernetes(self):
Expand Down Expand Up @@ -213,10 +214,15 @@ def poll_interval(self):
"""How often to poll Kubernetes for job status, in seconds."""
return self.__DEFAULT_POLL_INTERVAL

@property
def pod_creation_wait_interal(self):
"""Delay for initial pod creation for just submitted job in seconds"""
return self.__DEFAULT_POD_CREATION_INTERVAL

def __track_job(self):
"""Poll job status while active"""
while not self.__verify_job_has_started():
time.sleep(self.__DEFAULT_POLL_INTERVAL)
time.sleep(self.poll_interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@pisymbol pisymbol Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you have a formatting bug on L287 of this patch. You want to use '%s' instead of '{}' and '%' instead of comma to be stylistically consistent with the rest of the file.

I too am running into this one while running one of our more established pipelines btw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, i fixed it

Copy link

@pisymbol pisymbol Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. Timestamp issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have steered you wrong... Or maybe i misunderstood the code the first time.

I'm looking at this more closely and have noticed that now each iteration of this while not self.__verify_job_has_started() we'll wait both pod_creation_wait_interval seconds AND poll_interval seconds.

This doesn't seem desirable. It also isn't clear from the user perspective - for 2 intervals to both impact the same wait time.

Am i missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlstadther
Here's how I see this:
time.sleep on this line handling the case when we applied job to kubernetes API, but cluster state has not yet changed to reflect those changes and so no pods yet exist. This call will return empty list and the whole luigi Task will fail whereas after couple of seconds cluster state will catch up with our changes and actual KubernetesJob will run unsupervised by luigi.
And so time.sleep on this line is for case when pod is already created but containers within it are not ready. So in theory we won't wait on both intervals, when pods are ready code under this if clause won't execute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the explanation. Most likely this new added sleep would only happen once. Either afterwards we see pods and will always see while waiting for container running state, or we won't and assertion will be made.

self.__logger.debug("Waiting for Kubernetes job " + self.uu_name + " to start")
self.__print_kubectl_hints()

Expand Down Expand Up @@ -276,6 +282,12 @@ def __verify_job_has_started(self):

# Verify that the pod started
pods = self.__get_pods()
if not pods:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the logic here. How would the sleep impact the assertion afterwards?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @honnix . Seems like you'd want to re-eval pods=self.__get_pods() before asserting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

self.__logger.debug(
'No pods found for %s, waiting for cluster state to match the job definition' % self.uu_name
)
time.sleep(self.pod_creation_wait_interal)
pods = self.__get_pods()

assert len(pods) > 0, "No pod scheduled by " + self.uu_name
for pod in pods:
Expand Down