don't consider task with unknown state dependencies as pending #13

alexyu0 · 2020-04-11T00:02:23Z

this fixes the unknown state bug and pulls in anything that the master 2.7.6. version has that our branch does not

…ndencies as pending

ejsundstr · 2020-04-11T00:06:02Z

setup.py

@@ -54,7 +54,7 @@ def get_static_files(path):

 setup(
    name='luigi',
-    version='2.7.5.affirm.1.2.0',
+    version='2.7.5.affirm.alextestscheduler',


is this the version you want?

ejsundstr

I approved but left a comment that I think is worth looking at!

srikiraju · 2020-04-14T23:24:50Z

luigi/scheduler.py

+            has_failed_dependency = False
+            for dep in task.deps:
+                dep_task = self._state.get_task(dep, default=None)
+                if dep_task.status == UNKNOWN:


Whats an example where this happens?

How do you plan to recover from this state?

@srikiraju there is no recovery from this. This causes the worker to see that there are no pending tasks left and actually exit, otherwise it keeps waiting forever for a task that is NOT retried because only tasks in FAILED state are retried :-/

srikiraju · 2020-04-14T23:25:27Z

luigi/scheduler.py

+                    has_failed_dependency = True
+                    break
+
+            if not has_failed_dependency:


Instead of not updating any of the tasks, perhaps we take all these values and do something about the 'failed dependency'?

what do you have in mind when you say "do something"? as in report it or retry it? if you mean retry, by the time it gets to this point, the dependency failed in the requires section, which afaik is only run during add so it can't be retried.

gregsterin

On the worker side, does the exit still generate a proper exit code (error out) with this change?

Would be good to get some unit testing around this, considering how fragile luigi code can be.

gregsterin · 2020-04-11T00:36:59Z

luigi/configuration.py

 try:
    from ConfigParser import ConfigParser, NoOptionError, NoSectionError
+    Interpolation = object


what is this for??

Are these necessary changes?? they seem unrelated and unused.

this fixed an error i was getting when i tried to test it on an ODIN. i don't know how this wasn't a problem in production since configparser is only available in python3. since Interpolation isn't in ConfigParser, it gets set to object. I pulled this code snippet from the upstream Luigi repo

gregsterin · 2020-04-15T01:01:59Z

luigi/scheduler.py

+            has_failed_dependency = False
+            for dep in task.deps:
+                dep_task = self._state.get_task(dep, default=None)
+                if dep_task.status == UNKNOWN:


@srikiraju there is no recovery from this. This causes the worker to see that there are no pending tasks left and actually exit, otherwise it keeps waiting forever for a task that is NOT retried because only tasks in FAILED state are retried :-/

alexyu0 · 2020-04-15T01:32:33Z

On the worker side, does the exit still generate a proper exit code (error out) with this change?

Would be good to get some unit testing around this, considering how fragile luigi code can be.

Yes it still does, and the error code makes sense given the context. I reproduced it on an ODIN to test this but I can add some unit tests too for this.

gregsterin · 2020-04-16T01:21:19Z

test/unknown_state_handling_test.py

+    def requires(self):
+        print('failing')
+        raise Exception
+        return [DummyRequires()]


this return never gets hit, lets get rid of it.

gregsterin · 2020-04-16T01:22:17Z

test/unknown_state_handling_test.py

+    def setUp(self):
+        super(UnknownStateTest, self).setUp()
+        self.scheduler = luigi.scheduler.Scheduler(prune_on_get_work=False)
+        self.worker = luigi.worker.Worker(scheduler=self.scheduler)


worker needs to be run with keep_alive=True for this test to actually exercise that the worker exits.

gregsterin

Please test on stage that we can run a task that fails in requirements, and then rerun it with the failure fixed, and the task can be rerun OK.

For rollout - we should do it before UTC0, and check with Anumat that the current furnishing run is done before doing this.

alexyu0 added 3 commits April 9, 2020 15:26

bring in 2.7.6 scheduler changes, don't count tasks with UNKNOWN depe…

b0d9646

…ndencies as pending

fix interpolation import

5a9fee4

use getattr for param_visiblities

5c6bade

alexyu0 changed the title ~~Alexyu failed deps not pending~~ don't consider task with unknown state dependencies as pending Apr 11, 2020

ejsundstr reviewed Apr 11, 2020

View reviewed changes

ejsundstr approved these changes Apr 11, 2020

View reviewed changes

alexyu0 added 2 commits April 10, 2020 17:15

bump version

db2793b

don't bring in 2.7.6 changes

f0c9824

srikiraju reviewed Apr 14, 2020

View reviewed changes

gregsterin requested changes Apr 15, 2020

View reviewed changes

unit tests for unknown state handling

e1f13d8

gregsterin requested changes Apr 16, 2020

View reviewed changes

worker keep alive for test

f385a06

gregsterin approved these changes Apr 16, 2020

View reviewed changes

alexyu0 merged commit fe45dc0 into affirm-master Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't consider task with unknown state dependencies as pending #13

don't consider task with unknown state dependencies as pending #13

alexyu0 commented Apr 11, 2020

ejsundstr Apr 11, 2020

ejsundstr left a comment

srikiraju Apr 14, 2020

srikiraju Apr 14, 2020

gregsterin Apr 15, 2020

srikiraju Apr 14, 2020

alexyu0 Apr 15, 2020

gregsterin left a comment

gregsterin Apr 11, 2020

gregsterin Apr 14, 2020

alexyu0 Apr 15, 2020

gregsterin Apr 15, 2020

alexyu0 commented Apr 15, 2020

gregsterin Apr 16, 2020

gregsterin Apr 16, 2020

gregsterin left a comment

don't consider task with unknown state dependencies as pending #13

don't consider task with unknown state dependencies as pending #13

Conversation

alexyu0 commented Apr 11, 2020

Choose a reason for hiding this comment

ejsundstr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsterin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexyu0 commented Apr 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsterin left a comment

Choose a reason for hiding this comment