Task Level Configuration and `upstream-status-when-all` scheduler config #1782

javrasya · 2016-07-22T20:48:39Z

Description

Defining some scheduling configuration at task level as it is defined in #1073 . Also have new configuration named as upstream-status-when-all in [scheduler] section.

Motivation and Context

Task Level Configuration

It may be required to have different scheduling configuration for different tasks like retry-count(disable-num-failures). Assume that, a task is retrieving data from Hive which responses in long time and other one is retrieving data from RDBM which responses in short time. I may want Hive task not to retry or has small retry-count not to block all my system again an again. But, retrying RDBM task many times may not be problem for me. Assume that, I have some network issue for a while and I should be able to give its retry-count more than Hive task. So I would need to define those configuration at task level.

This fixes the issue defined in #1073

`upstream-status-when-all` config

Normally luigi scheduler sets a wrapper task status according to its upstream task severity statuses. For example; if any of its upstream task severity is DISABLED or FAILED, scheduler sets the task status as FAILED. If this configuration is set as True; scheduler won't make the task as UPSTREAM_FAILED or UPSTREAM_DISABLED unless all of its tasks excluding SUCCESS
ones are FAILED or DISABLED.

This is not a problem when there was task level configuration feature. But After this feature, some upstream task may retry even one of other upstream tasks is DISABLED or FAILED.
Please check the example which is added in this PR.

Have you tested this? If so, how?

There is also a unit test for those features. Debugging with PyCharm, running available task with tox (Some of them are not able to run in my local machine. I don't know, maybe tests include hdfs or other outer integration systems) and finally with travis(with my account) on my forked repo. I met some optimization problem. Adding task by scheduler was so slow and I realised it because of a test. Thanks to test_get_work_speed in central_planner_test.py.

What is added extra?

Documentation
Unit-test
An example

…supports defining some configuration at task level. Those configuration can be spesified at task level are `disable-num-failures`, `disable-hard-timeout`, `disable-window-seconds`, `upstream-status-when-all`. There is also added a config for scheduler named as `upstream-status-when-all` which is definition of how to set a task status as its upstream task statuses. Tests and documentation are also added.

… of adding task is so slow.

Tarrasch · 2016-07-25T03:02:11Z

examples/task_spesific_configuration.py

+# -*- coding: utf-8 -*-
+
+#
+# To make this run, you probably want to edit /etc/luigi/client.cfg and add something like:


Something like what? :)

Tarrasch · 2016-07-25T03:29:49Z

Summary of my comments:

Excellent documentation
Despite it's a pretty short code change I think it could be even shorter and easier to read/maintain etc.
I think there's 1 bug related to run()time dependencies feature (new_deps_config)

Looks really promising! :)

Tarrasch · 2016-07-25T03:31:13Z

test/scheduler_test.py

@@ -112,5 +112,129 @@ def __init__(self):

        worker.prune(TmpCfg())

+    @with_config({'scheduler': {'disable-num-failures': '44'}})
+    def test_scheduler_with_task_level_config(self):
+        cps = luigi.scheduler.CentralPlannerScheduler()


Sorry, as I merged #1781 you have to update this line :)

…supports defining some configuration at task level. Those configuration can be spesified at task level are `disable-num-failures`, `disable-hard-timeout`, `disable-window-seconds`, `upstream-status-when-all`. There is also added a config for scheduler named as `upstream-status-when-all` which is definition of how to set a task status as its upstream task statuses. Tests and documentation are also added.

… of adding task is so slow.

javrasya · 2016-07-25T11:49:42Z

@Tarrasch I don't want to bother more by pushing more commit in this PR without being sure. Can you review changes according to your comments . I made sort of major changes.

I changed the way of using dict to @Property
named-tuples are used to make it more readable and maintainable.
config -> retry-policy as term. (task_cofigs -> retry_policy, deps_task_configs -> deps_retry_policies)
new_deps_configs is removed.
It is added that how to run the added example on command line.

Please keep commenting there and I will correct it and merge it into my master branch to update PR after all.

…e task command line runing tip is updated and more clear and readable now and other small changes

Tarrasch · 2016-07-26T10:45:02Z

Ok. I think we aren't on the same page when it comes to upstream-status-when-all. Maybe I've misunderstood parts of the PR. Also comment javrasya@44415be#commitcomment-18391330 comes to mind.

Let's render an example.

Task X depends on Y2 and Y5. Y2 has retry-count 2 and Y5 has retry-count 5. Let's say that In our example X will never run because both Y2 and Y5 will not both get DONE.

Now let's say Y5 failed and is now marked as red while Y2 have not run yet and is pending.

upstream-status-when-all: If this config is set to false (default) for task X, then X will be in the set of UPSTREAM_FAILED because Y5 is. But if this variable is set to true, it will not be in UPSTREAM_FAILED, becuase Y2 is still pending.

We note that UPSTREAM_FAILED is not something a task ever is marked as, it's just a view that's only exists in the luigi scheduler. It's possible that a task is UPSTREAM_FAILED, UPSTREAM_DISABLED, both, or none.

Are we on the same page so far?

Actually. I downloaded your branch and tested to run your example. And I do see what you mean. now. It only works when that variable is true. What about this proposal. Because I really don't want to introduce a configuration option that takes a luigi maintainer hours to understand, can you try to change the default behaviour to do as if your new variable was true?

Yes, it's a bit provocative to change this over 3 years old concept in luigi. But I don't really seeing it being used anywhere except for visualization. There's only one place (as of #871) and in that case I believe the author mistakenly assumed the behaviour works as I propose we change it too.

I checked that no scheduler-api tests fail after the change, and I can help you to provide a test case that will fail until we change the behaviour. How does that sound? Less code for both of us and I think we fix a design-flaw in luigi. And yes, you will run into this bug in production as of today, it's just that in your case both child tasks crash after 0 seconds, if one task has a delay you will get into the same problem even if both have the same retry-count.

Tarrasch · 2016-07-26T10:56:22Z

Also linking in #686, even there the author said "if all dependencies", it seems that upstream-status-when-all should always be True and it's False for some ancient mistake years ago.

javrasya · 2016-07-26T11:33:05Z

I would go for not including this configuration if we have a chance and changing the default behavior for 'all' instead of 'any'. If you say it, I will change the behavior and remove the added configuration.

Tarrasch · 2016-07-27T02:24:42Z

Yes. I think the default behaviour should change, today I'll work on creating a test case which only passes with the new behaviour.

… on setting parent task severity as `UPSTREAM_DISABLED` or `UPSTREAM_FAILED` when any of its upstream task is `DISABLED` or `FAILED` is changed to when all of its upstream task is `DISABLED` or `FAILED`. `disable_failures`->`retry-count` in scheduler class.

…-failures -> retry-count in config file and its documentation is updated.

javrasya · 2016-07-27T09:42:35Z

A few tests fail after I change the current behavior of luigi on that. I will fix them and update the PR soon.

…rent task severity as its upstream tasks statuses, is changed

Conflicts: doc/configuration.rst luigi/scheduler.py luigi/worker.py test/scheduler_test.py

Tarrasch · 2016-07-27T10:05:16Z

@javrasya, it would be so ideal if that could be made into a separate PR that we merge first. Could you arrange for that? Then we have one diffview where it's clear what that change affects. I remember scheduler_api tests did not start failing, but maybe I ran it incorrectly.

javrasya · 2016-07-27T10:08:20Z

The one which fails is test_task_list_upstream_status. I fixed it and pushed it but I can rollback it if you want.

Tarrasch · 2016-07-27T10:14:10Z

doc/configuration.rst

+------------------------+-----------+
+| Config                 | Section   |
+========================+===========+
+| retry-count            | scheduler |


retry_count. Use underscores.

True it is from first version of this feature.

Tarrasch · 2016-07-27T10:18:53Z

luigi/scheduler.py

-    disable_failures = parameter.IntParameter(default=999999999,
-                                              config_path=dict(section='scheduler', name='disable-num-failures'))
+    retry_count = parameter.IntParameter(default=999999999,
+                                         config_path=dict(section='scheduler', name='retry-count'))


noooo... s/retry-count/disable_failures this please.

Sorry for the confusion. But config_path should have really been called deprecated_config_path from day one. The confusion is as of our bad naming. Remember, the non-deprecated config path is automatic.

Ok probably, I got it wrongly. I first didn't change it but, after a comment of yours, I changed it. I can rollback

OK I am bit of confused about your comment Oh, even more important. Change this line to say name='disable_failures'! Also update the docs to tell people to use retry_count. . Should I rollback it?

Code should be name='disable_failures' so we keep backward compatiblity. Docs should encourage usage of the newest name retry_count, so people dont get deprecation warnings.

Just add upon commits. There's no point in "rolling back". In the end I'll squash everything anyway as the history is a bit messy already. :)

Oh, I guess I got it now. You mean, If I describe it as retry_count in .cfg file, even the name is disable-num-failures, both will be able to be used ? So all I need to commit is to change the part as name='disable-num-failures', not the doc and doc can remain as retry_count right?

…m-failures` back again for backward compatibility

javrasya · 2016-07-27T10:58:14Z

luigi/scheduler.py

-                                      for a_task_id in dep.deps),
-                                     key=UPSTREAM_SEVERITY_KEY)
+                        upstream_severities = list(upstream_status_table.get(a_task_id) for a_task_id in dep.deps if a_task_id in upstream_status_table) or ['']
+                        status = min(upstream_severities, key=UPSTREAM_SEVERITY_KEY)


Here is the change of luigi default behavior to detect upstream severity. Parent task is marked as UPSTREAM_DISABLED when any of its upstream task is DISABLED and the worker is shutdown because of it with how luigi works currently. This behavior is changed to wait for all upstream tasks to be DISABLED or FAILED to mark parent task as UPSTREAM_DISABLED or UPSTREAM_FAILED.

Tarrasch · 2016-07-28T02:35:38Z

Ok so your first next step here is to create a new pr with only the "upstream-status-when-all" change. I'll try to quickly merge it and then you can create a follow up PR with the rest of the changes.

This PR is really messy now. It's better to have many small PRs that get merged so we have the feeling that we're progressing. :)

javrasya added 8 commits July 22, 2016 20:10

Optimization is added. test_get_work_speed test was failing because…

b8631e3

… of adding task is so slow.

An example is added for task level configuration

Verified

This commit was signed with the committer’s verified signature. The key has expired.

hiyuki2578 Shota Tsunehiro

GPG key ID: E0261D2813732128
Expired

Verified
Learn about vigilant mode

308c072

Renamed MyTask->MyTaskHasConfig because of TaskClassAmbigiousException

fa0a774

Renamed Unambigious task names

7659ba9

Unnecessary code formatting is rollback

d5bfdf3

Python3 compatiable prints

0235b2a

Tarrasch reviewed Jul 25, 2016
View reviewed changes

javrasya added 8 commits July 25, 2016 12:53

Optimization is added. test_get_work_speed test was failing because…

5886373

… of adding task is so slow.

An example is added for task level configuration

817256a

Unnecessary code formatting is rollbacked

d1c04b6

Renamed MyTask->MyTaskHasConfig because of TaskClassAmbigiousException

c673d4e

Renamed Unambigious task names

a2f2694

Unnecessary code formatting is rollback

b07b402

Python3 compatiable prints

996c629

disable_num_failures -> retry_count in RetryPolicy namedtuple, exampl…

48996b5

…e task command line runing tip is updated and more clear and readable now and other small changes

javrasya added 2 commits July 27, 2016 09:55

_get_retry_policy -> _generate_retry_policy in Scheduler. disable-num…

10fc315

…-failures -> retry-count in config file and its documentation is updated.

javrasya added 2 commits July 27, 2016 12:48

tests are fixed after luigi default behavior which is about seting pa…

d2f7506

…rent task severity as its upstream tasks statuses, is changed

Merge remote-tracking branch 'origin/comment-update'

951abce

Conflicts: doc/configuration.rst luigi/scheduler.py luigi/worker.py test/scheduler_test.py

javrasya added 3 commits July 27, 2016 13:10

old example is removed

ddf788c

merge fix

63eaf7f

test_renamegs_dont_move_on_fs -> test_rename_dont_move_on_fs

3a262f6

Tarrasch reviewed Jul 27, 2016
View reviewed changes

configuration about per task retry policy doc is fixed.

2112c28

Tarrasch reviewed Jul 27, 2016
View reviewed changes

retry-count -> retry_count in doc. name in config_path is `disable-nu…

c678689

…m-failures` back again for backward compatibility

javrasya reviewed Jul 27, 2016
View reviewed changes

javrasya added 3 commits July 27, 2016 14:27

Merge remote-tracking branch 'upstream/master'

1d19519

doc fix in luigi.Task.retry_count

f22ab11

python3x iteritems fix

ad5060a

Merge remote-tracking branch 'upstream/master'

c669e92

javrasya mentioned this pull request Jul 28, 2016

Marking as minimum upstream severity instead of max #1789

Merged

javrasya closed this Jul 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Level Configuration and `upstream-status-when-all` scheduler config #1782

Task Level Configuration and `upstream-status-when-all` scheduler config #1782

javrasya commented Jul 22, 2016 •

edited

Loading

Tarrasch Jul 25, 2016

Tarrasch commented Jul 25, 2016 •

edited

Loading

Tarrasch Jul 25, 2016 •

edited

Loading

javrasya commented Jul 25, 2016 •

edited

Loading

Tarrasch commented Jul 26, 2016

Tarrasch commented Jul 26, 2016

javrasya commented Jul 26, 2016 •

edited

Loading

Tarrasch commented Jul 27, 2016

javrasya commented Jul 27, 2016

Tarrasch commented Jul 27, 2016

javrasya commented Jul 27, 2016

Tarrasch Jul 27, 2016

javrasya Jul 27, 2016

Tarrasch Jul 27, 2016 •

edited

Loading

javrasya Jul 27, 2016

javrasya Jul 27, 2016

Tarrasch Jul 27, 2016 •

edited

Loading

javrasya Jul 27, 2016 •

edited

Loading

javrasya Jul 27, 2016 •

edited

Loading

Tarrasch commented Jul 28, 2016

Task Level Configuration and upstream-status-when-all scheduler config #1782

Task Level Configuration and upstream-status-when-all scheduler config #1782

Conversation

javrasya commented Jul 22, 2016 • edited Loading

Description

Motivation and Context

Task Level Configuration

upstream-status-when-all config

Have you tested this? If so, how?

What is added extra?

Tarrasch Jul 25, 2016

Choose a reason for hiding this comment

Tarrasch commented Jul 25, 2016 • edited Loading

Tarrasch Jul 25, 2016 • edited Loading

Choose a reason for hiding this comment

javrasya commented Jul 25, 2016 • edited Loading

Tarrasch commented Jul 26, 2016

Tarrasch commented Jul 26, 2016

javrasya commented Jul 26, 2016 • edited Loading

Tarrasch commented Jul 27, 2016

javrasya commented Jul 27, 2016

Tarrasch commented Jul 27, 2016

javrasya commented Jul 27, 2016

Tarrasch Jul 27, 2016

Choose a reason for hiding this comment

javrasya Jul 27, 2016

Choose a reason for hiding this comment

Tarrasch Jul 27, 2016 • edited Loading

Choose a reason for hiding this comment

javrasya Jul 27, 2016

Choose a reason for hiding this comment

javrasya Jul 27, 2016

Choose a reason for hiding this comment

Tarrasch Jul 27, 2016 • edited Loading

Choose a reason for hiding this comment

javrasya Jul 27, 2016 • edited Loading

Choose a reason for hiding this comment

javrasya Jul 27, 2016 • edited Loading

Choose a reason for hiding this comment

Tarrasch commented Jul 28, 2016

Task Level Configuration and `upstream-status-when-all` scheduler config #1782

Task Level Configuration and `upstream-status-when-all` scheduler config #1782

javrasya commented Jul 22, 2016 •

edited

Loading

`upstream-status-when-all` config

Tarrasch commented Jul 25, 2016 •

edited

Loading

Tarrasch Jul 25, 2016 •

edited

Loading

javrasya commented Jul 25, 2016 •

edited

Loading

javrasya commented Jul 26, 2016 •

edited

Loading

Tarrasch Jul 27, 2016 •

edited

Loading

Tarrasch Jul 27, 2016 •

edited

Loading

javrasya Jul 27, 2016 •

edited

Loading

javrasya Jul 27, 2016 •

edited

Loading