Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't consider task with unknown state dependencies as pending #13

Merged
merged 7 commits into from
Apr 17, 2020

Conversation

alexyu0
Copy link

@alexyu0 alexyu0 commented Apr 11, 2020

this fixes the unknown state bug and pulls in anything that the master 2.7.6. version has that our branch does not

@alexyu0 alexyu0 changed the title Alexyu failed deps not pending don't consider task with unknown state dependencies as pending Apr 11, 2020
setup.py Outdated
@@ -54,7 +54,7 @@ def get_static_files(path):

setup(
name='luigi',
version='2.7.5.affirm.1.2.0',
version='2.7.5.affirm.alextestscheduler',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the version you want?

Copy link

@ejsundstr ejsundstr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved but left a comment that I think is worth looking at!

has_failed_dependency = False
for dep in task.deps:
dep_task = self._state.get_task(dep, default=None)
if dep_task.status == UNKNOWN:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats an example where this happens?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to recover from this state?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikiraju there is no recovery from this. This causes the worker to see that there are no pending tasks left and actually exit, otherwise it keeps waiting forever for a task that is NOT retried because only tasks in FAILED state are retried :-/

has_failed_dependency = True
break

if not has_failed_dependency:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of not updating any of the tasks, perhaps we take all these values and do something about the 'failed dependency'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you have in mind when you say "do something"? as in report it or retry it? if you mean retry, by the time it gets to this point, the dependency failed in the requires section, which afaik is only run during add so it can't be retried.

Copy link

@gregsterin gregsterin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the worker side, does the exit still generate a proper exit code (error out) with this change?

Would be good to get some unit testing around this, considering how fragile luigi code can be.

try:
from ConfigParser import ConfigParser, NoOptionError, NoSectionError
Interpolation = object

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for??

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these necessary changes?? they seem unrelated and unused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixed an error i was getting when i tried to test it on an ODIN. i don't know how this wasn't a problem in production since configparser is only available in python3. since Interpolation isn't in ConfigParser, it gets set to object. I pulled this code snippet from the upstream Luigi repo

has_failed_dependency = False
for dep in task.deps:
dep_task = self._state.get_task(dep, default=None)
if dep_task.status == UNKNOWN:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikiraju there is no recovery from this. This causes the worker to see that there are no pending tasks left and actually exit, otherwise it keeps waiting forever for a task that is NOT retried because only tasks in FAILED state are retried :-/

@alexyu0
Copy link
Author

alexyu0 commented Apr 15, 2020

On the worker side, does the exit still generate a proper exit code (error out) with this change?

Would be good to get some unit testing around this, considering how fragile luigi code can be.

Yes it still does, and the error code makes sense given the context. I reproduced it on an ODIN to test this but I can add some unit tests too for this.

def requires(self):
print('failing')
raise Exception
return [DummyRequires()]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this return never gets hit, lets get rid of it.

def setUp(self):
super(UnknownStateTest, self).setUp()
self.scheduler = luigi.scheduler.Scheduler(prune_on_get_work=False)
self.worker = luigi.worker.Worker(scheduler=self.scheduler)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker needs to be run with keep_alive=True for this test to actually exercise that the worker exits.

Copy link

@gregsterin gregsterin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test on stage that we can run a task that fails in requirements, and then rerun it with the failure fixed, and the task can be rerun OK.

For rollout - we should do it before UTC0, and check with Anumat that the current furnishing run is done before doing this.

@alexyu0 alexyu0 merged commit fe45dc0 into affirm-master Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants