-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't consider task with unknown state dependencies as pending #13
Conversation
setup.py
Outdated
@@ -54,7 +54,7 @@ def get_static_files(path): | |||
|
|||
setup( | |||
name='luigi', | |||
version='2.7.5.affirm.1.2.0', | |||
version='2.7.5.affirm.alextestscheduler', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the version you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved but left a comment that I think is worth looking at!
has_failed_dependency = False | ||
for dep in task.deps: | ||
dep_task = self._state.get_task(dep, default=None) | ||
if dep_task.status == UNKNOWN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats an example where this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you plan to recover from this state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srikiraju there is no recovery from this. This causes the worker to see that there are no pending tasks left and actually exit, otherwise it keeps waiting forever for a task that is NOT retried because only tasks in FAILED state are retried :-/
has_failed_dependency = True | ||
break | ||
|
||
if not has_failed_dependency: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of not updating any of the tasks, perhaps we take all these values and do something about the 'failed dependency'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you have in mind when you say "do something"? as in report it or retry it? if you mean retry, by the time it gets to this point, the dependency failed in the requires
section, which afaik is only run during add
so it can't be retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the worker side, does the exit still generate a proper exit code (error out) with this change?
Would be good to get some unit testing around this, considering how fragile luigi code can be.
try: | ||
from ConfigParser import ConfigParser, NoOptionError, NoSectionError | ||
Interpolation = object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these necessary changes?? they seem unrelated and unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixed an error i was getting when i tried to test it on an ODIN. i don't know how this wasn't a problem in production since configparser
is only available in python3. since Interpolation
isn't in ConfigParser
, it gets set to object
. I pulled this code snippet from the upstream Luigi repo
has_failed_dependency = False | ||
for dep in task.deps: | ||
dep_task = self._state.get_task(dep, default=None) | ||
if dep_task.status == UNKNOWN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srikiraju there is no recovery from this. This causes the worker to see that there are no pending tasks left and actually exit, otherwise it keeps waiting forever for a task that is NOT retried because only tasks in FAILED state are retried :-/
Yes it still does, and the error code makes sense given the context. I reproduced it on an ODIN to test this but I can add some unit tests too for this. |
test/unknown_state_handling_test.py
Outdated
def requires(self): | ||
print('failing') | ||
raise Exception | ||
return [DummyRequires()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this return never gets hit, lets get rid of it.
test/unknown_state_handling_test.py
Outdated
def setUp(self): | ||
super(UnknownStateTest, self).setUp() | ||
self.scheduler = luigi.scheduler.Scheduler(prune_on_get_work=False) | ||
self.worker = luigi.worker.Worker(scheduler=self.scheduler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker needs to be run with keep_alive=True for this test to actually exercise that the worker exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test on stage that we can run a task that fails in requirements, and then rerun it with the failure fixed, and the task can be rerun OK.
For rollout - we should do it before UTC0, and check with Anumat that the current furnishing run is done before doing this.
this fixes the unknown state bug and pulls in anything that the master 2.7.6. version has that our branch does not