All tasks without dependencies are root-ish #7221

gjoseph92 · 2022-10-28T18:38:46Z

If a task doesn't have dependencies, then it's obviously a root task. However, the current logic would only consider it so if the TaskGroup was also 2x larger than the cluster.

That allowed for the awkward case where small groups of root tasks would go down a different code path. It's not clear how much this code path was even used or needed, xref #6974.

See a little other discussion in #7204 (comment).

Closes #7274.

Tests added / passed
Passes pre-commit run --all-files

This isn't a very good test, since it's weirdly stateful and making assumptions about which worker is selected in an empty cluster

…led_error`

github-actions · 2022-10-28T20:49:22Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 8h 6m 39s ⏱️ + 1h 37m 50s
  3 170 tests +  2   3 035 ✔️ -   48   84 💤 +1   51 ❌ +  49
23 456 runs +16 22 220 ✔️ - 316 904 💤 +4 332 ❌ +328

For more details on these failures, see this check.

Results for commit 17f1ba4. ± Comparison against base commit 02b9430.

♻️ This comment has been updated with latest results.

…ual`

…h_memory_monitor`

Mostly works except... freeze_batched_send isn't having the intended effect

mrocklin · 2022-10-31T18:15:19Z

That allowed for the awkward case where small groups of root tasks would go down a different code path. It's not clear how much this code path was even used or needed, xref #6974.

I don't have a ton to add to this conversation, but some examples of important cases that look like this include:

da.from_zarr where there is a single zarr array dataset in the graph
client.submit (possibly in a for loop)

Would it make sense to avoid this logic in these cases?

hopefully will be able to remove this in the future.

gjoseph92 · 2022-10-31T19:58:55Z

First, let's clarify that this is_rootish check was originally written in #4967 just to decide which tasks should get co-assigned, versus which should follow "normal" scheduling logic. (Then, within the "normal" scheduling logic, there was another branch for the no-deps case we're talking about removing in #6974.)

We've now co-opted this "should this task be co-assigned" logic and are also using it to decide "should this task be queued?" We do this because "is this a root task" happens to be a good answer to both of those questions. But when thinking about possible impacts, it could be helpful to ask "what's the effect of co-assigning these tasks" separately from "what's the effect of queuing these tasks".

from_zarr
- co-assigning: doesn't matter. There's only one task anyway.
- queuing: makes behavior more consistent. If there are other tasks already on the cluster, it makes more sense that this should wait in the queue until space opens up (though it has no effect on actual performance, since opening a zarr dataset doesn't take much memory).
single client.submit
- co-assigning: doesn't matter. There's only one task. However, you lose round-robin. (Though test_quiet_cluster_round_robin still passes on this branch even though we're bypassing the round-robin code path, which I can't explain).
- queuing: makes behavior more consistent. These tasks now have to wait in the queue like everything else instead of skipping in line.
client.submit in a for loop
- co-assign: well, co-assignment has always been strange for this case. Before, we'd submit the first nthreads*2 tasks randomly/round-robin, then flip to co-assignment mode and submit them in batches to a worker. Now, we'll always submit in batches, but initially the batch size will be 1 until the TaskGroup grows large enough. So worker selection/load distribution will be very slightly different, but I think not in a way that matters or is noticeable.
- queuing: makes behavior more consistent. All tasks are queued instead of the first nthreads*2 tasks getting a fast-pass.
client.submit in a for loop with follow-up tasks
- co-assign: same as above, no meaningful effect
- queuing: could reduce overproduction (assuming tasks are slower than task submission). Right now you'll get root task overproduction with submit in a for-loop, because the first nthreads*2 tasks will be sent to workers immediately before queuing flips on. If you client.submit all your root tasks, then client.submit downstream tasks before any root tasks have finished, overproduction will be prevented with this change.

tl;dr I can't think of a meaningful way those cases would get worse, or reason they should avoid queuing/co-assignment logic

Overall, I don't think this change should affect actual use much. Nearly all of my motivation for doing it is to reduce the number of branches to worry about, make things more consistent, simplify testing, and increase confidence that our tests are actually testing the queuing code path. For instance, because most tests just client.submit a couple tasks, a ton of tests were actually not using the queuing code path even with worker-saturation: 1.0. (This change is how I found #7223).

mrocklin · 2022-10-31T20:13:28Z

I'm not a blocker on this

…

On Mon, Oct 31, 2022 at 2:59 PM Gabe Joseph ***@***.***> wrote: First, let's clarify that this is_rootish check was originally written in #4967 <#4967> just to decide which tasks should get co-assigned, versus which should follow "normal" scheduling logic. (Then, within the "normal" scheduling logic, there was *another* branch for the no-deps case we're talking about removing in #6974 <#6974>.) We've now co-opted this "should this task be co-assigned" logic and are also using it to decide "should this task be queued?" We do this because "is this a root task" happens to be a good answer to both of those questions. But when thinking about possible impacts, it could be helpful to ask "what's the effect of co-assigning these tasks" separately from "what's the effect of queuing these tasks". 1. from_zarr - co-assigning: doesn't matter. There's only one task anyway. - queuing: makes behavior more consistent. If there are other tasks already on the cluster, it makes more sense that this should wait in the queue until space opens up (though it has no effect on actual performance, since opening a zarr dataset doesn't take much memory). 2. single client.submit - co-assigning: doesn't matter. There's only one task. However, you lose round-robin. (Though test_quiet_cluster_round_robin still passes on this branch even though we're bypassing the round-robin code path, which I can't explain). - queuing: makes behavior more consistent. These tasks now have to wait in the queue like everything else instead of skipping in line. 3. client.submit in a for loop - co-assign: well, co-assignment has always been strange for this case. Before, we'd submit the first nthreads*2 tasks randomly/round-robin, then flip to co-assignment mode and submit them in batches to a worker. Now, we'll always submit in batches, but initially the batch size will be 1 until the TaskGroup grows large enough. So worker selection/load distribution will be very slightly different, but I think not in a way that matters or is noticeable. - queuing: makes behavior more consistent. All tasks are queued instead of the first nthreads*2 tasks getting a fast-pass. 4. client.submit in a for loop with follow-up tasks - co-assign: same as above, no meaningful effect - queuing: could reduce overproduction (assuming tasks are slower than task submission). Right now you'll get root task overproduction with submit in a for-loop, because the first nthreads*2 tasks will be sent to workers immediately before queuing flips on. If you client.submit all your root tasks, then client.submit downstream tasks before any root tasks have finished, overproduction will be prevented with this change. *tl;dr I can't think of a meaningful way those cases would get worse, or reason they should avoid queuing/co-assignment logic* Overall, I don't think this change should affect actual use much. Nearly all of my motivation for doing it is to reduce the number of branches to worry about, make things more consistent, simplify testing, and increase confidence that our tests are actually testing the queuing code path. For instance, because most tests just client.submit a couple tasks, a ton of tests were actually not using the queuing code path even with worker-saturation: 1.0. (This change is how I found #7223 <#7223>). — Reply to this email directly, view it on GitHub <#7221 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTB55UWUVSJ2SX23KUTWGAQITANCNFSM6AAAAAARRKIJHU> . You are receiving this because you commented.Message ID: ***@***.***>

gjoseph92 · 2022-10-31T20:49:06Z

test_bad_disk flaky #7208 everywhere
Flaky distributed/diagnostics/tests/test_task_stream.py::test_client_sync #6820
distributed/tests/test_cluster_dump.py::test_cluster_dump_to_yamls but I've seen this before, fixed in Cluster dump inspection improvements #6015 5bbaaa4

distributed/tests/test_scheduler.py::test_balance_many_workers - assert {0, 1, 2} == {0, 1} could be real, looking into it: https://github.com/dask/distributed/actions/runs/3364213017/jobs/5578308124#step:18:1315

This makes me wonder if the result is very meaningful though:

2022-10-31 20:08:16,899 - distributed.utils_perf - WARNING - full garbage collections took 24% CPU time recently (threshold: 10%)
2022-10-31 20:08:17,803 - distributed.utils_perf - WARNING - full garbage collections took 24% CPU time recently (threshold: 10%)
2022-10-31 20:08:40,283 - distributed.core - INFO - Event loop was unresponsive in Scheduler for 24.96s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.

We were ending up with `last_worker_tasks_left` initially as 0, which immediately decremented to -1. Then it was a truthy value, so we reused the worker for the entire task group!

seemed to be relying on round-robin, since it only ever looked at one worker

gjoseph92 · 2022-10-31T23:43:17Z

All green besides:

test_bad_disk flaky #7208
distributed/dashboard/tests/test_scheduler_bokeh.py::test_shuffling

distributed/scheduler.py

distributed/tests/test_priorities.py

crusaderky

I think there's a test for an impossible condition that should be replaced with an assertion (read above). Everything else looks good.

Co-authored-by: crusaderky <[email protected]>

gjoseph92 · 2022-11-03T00:42:24Z

there's a test for an impossible condition

@crusaderky see #7221 (comment)

gjoseph92 · 2022-11-03T03:07:56Z

Again only failure is

test_bad_disk flaky #7208

gjoseph92 · 2022-11-03T03:54:39Z

Follow-ups if this is merged:

gjoseph92 · 2022-11-03T05:04:54Z

Another reason to merge this:

The current code path (when the taskgroup is smaller than the cluster) can have bad behavior for worker-saturation > 1.0. Because of the round-up, when using a value like 1.1, workers will be in the idle set even when all their threads are in use. In a non-homogeneous cluster, this can lead to picking a worker that's completely full even when there are workers with open threads. That's the original thing that happened in #7197.

fjetter

There is a lot of test refactoring going on. I would appreciate these changes to be in a separate PR to distinguish functional changes from refactoring much better. This is a non-trivial change in behavior and mixing it up with generic code refactoring makes it difficult to see what's really going on.

From what I can see you are introducing one new test and are skipping one.

You are already discussing four different edge cases (and I'm sure there are more). Can you please write them as a test? This way we know for a fact that the code behaves this way instead of relying on a lengthy argument.

fjetter · 2022-11-03T09:47:31Z

distributed/tests/test_client_executor.py

-    with client.get_executor(retries=5, pure=False) as e:
+    with client.get_executor(retries=6, pure=False) as e:
        future = e.submit(varying(args))
        assert future.result() == 42

-    with client.get_executor(retries=4) as e:
+    with client.get_executor(retries=1) as e:
        future = e.submit(varying(args))
-        result = future.result()
-        assert result == 42
-
-    with client.get_executor(retries=2) as e:
-        future = e.submit(varying(args))
-        with pytest.raises(ZeroDivisionError, match="two"):


Any reason to change these retries?

This test is extremely sensitive to how workers are selected on an idle cluster. It uses this stateful varying utility, which changes its behavior depending on how many times it's been called on that worker.

It had to be changed when the idle-round-robin behavior was added: https://github.com/dask/distributed/pull/4638/files#diff-59af67191283f0c64a3be8ce1f344f49b9d025f8264b77fba5c8250865bde433

So I've had to change it again since it's being removed.

fjetter · 2022-11-03T09:48:14Z

distributed/tests/test_priorities.py

-            pytest.param(False, id="queue on worker"),
-            pytest.param(True, id="queue on scheduler"),
-        ],


Why remove the ids?

They weren't accurate anymore. I could call then pause and clog or something.

fjetter · 2022-11-03T09:51:45Z

distributed/tests/test_scheduler.py

@@ -2849,6 +2860,10 @@ async def test_get_worker_monitor_info(s, a, b):
        assert res[w.address]["last_time"] is not None


+@pytest.mark.skipif(


I think we need to preserve this kind of behavior. I don't care if it's actually round robin or not but if I submit three tasks and there are two workers, both workers should have work.

What's the plan to fix this?

if I submit three tasks and there are two workers, both workers should have work

That's not what this is testing. What you're describing above already will happen. This PR in fact improves that case. See #7197: if you set worker-saturation to 1.1 right now, test_wait_first_completed will fail because 2 tasks get assigned to the 1-threaded worker, and one task is assigned to the 2-threaded worker, because they're using the old code path.

This test is testing that if you submit a task to a completely empty cluster, wait for it to complete, release it, then submit another task, that you'll get different workers each time. That's a different and much more niche case.

Approaches for this:

Round-robin empty workers with queueing enabled #7222

Queuing: lowest-memory worker as tiebreaker #7248

Please delete the test

gjoseph92 · 2022-11-03T15:50:12Z

You are already discussing four different edge cases (and I'm sure there are more). Can you please write them as a test? This way we know for a fact that the code behaves this way instead of relying on a lengthy argument.

I'm not following what you'd want to assert about the cases mentioned in here #7221 (comment) (I'm assuming those are the four cases you're talking about).

There are already a number of tests that cover aspects of this behavior, for instance:

distributed/distributed/tests/test_client.py

Lines 3255 to 3260 in 02b9430

    
           @gen_cluster(client=True, nthreads=[("127.0.0.1", 1)] * 4) 
        
           async def test_balanced_with_submit(c, s, *workers): 
        
               L = [c.submit(slowinc, i) for i in range(4)] 
        
               await wait(L) 
        
               for w in workers: 
        
                   assert len(w.data) == 1

distributed/distributed/tests/test_client.py

Lines 3272 to 3284 in 02b9430

    
           @gen_cluster(client=True, nthreads=[("127.0.0.1", 20)] * 2) 
        
           async def test_scheduler_saturates_cores(c, s, a, b): 
        
               for delay in [0, 0.01, 0.1]: 
        
                   futures = c.map(slowinc, range(100), delay=delay) 
        
                   futures = c.map(slowinc, futures, delay=delay / 10) 
        
                   while not s.tasks: 
        
                       if s.tasks: 
        
                           assert all( 
        
                               len(p) >= 20 
        
                               for w in s.workers.values() 
        
                               for p in w.processing.values() 
        
                           ) 
        
                       await asyncio.sleep(0.01)

distributed/distributed/tests/test_scheduler.py

Lines 1543 to 1565 in 02b9430

    
           @gen_cluster(client=True, nthreads=[("127.0.0.1", 1)] * 30) 
        
           async def test_balance_many_workers(c, s, *workers): 
        
               futures = c.map(slowinc, range(20), delay=0.2) 
        
               await wait(futures) 
        
               assert {len(w.has_what) for w in s.workers.values()} == {0, 1} 
        
           # FIXME test is very timing-based; if some threads are consistently slower than others, 
        
           # they'll receive fewer tasks from the queue (a good thing). 
        
           @pytest.mark.skipif( 
        
               MACOS and math.isfinite(dask.config.get("distributed.scheduler.worker-saturation")), 
        
               reason="flaky on macOS with queuing active", 
        
           ) 
        
           @nodebug 
        
           @gen_cluster( 
        
               client=True, 
        
               nthreads=[("127.0.0.1", 1)] * 30, 
        
               config={"distributed.scheduler.work-stealing": False}, 
        
           ) 
        
           async def test_balance_many_workers_2(c, s, *workers): 
        
               futures = c.map(slowinc, range(90), delay=0.2) 
        
               await wait(futures) 
        
               assert {len(w.has_what) for w in s.workers.values()} == {3}

I could see adding tests for:

client.submit after the cluster is already saturated doesn't get to cut in line before tasks that were submitted earlier (would fail today)
client.submit in a for loop with follow-up tasks doesn't have overproduction with queuing on. This would just be test_graph_execution_width using futures instead of delayed.

fjetter · 2022-11-03T17:03:33Z

I'm cool with merging iff

Find out where we hit the "dead code". If it's actually dead code we need to remove it. If it is not we should have a test that shows the difference in behavior
Have a guestimate on how bad performance is
CI is green-ish

gjoseph92 · 2022-11-04T00:05:25Z

I removed the dead code in c901823 and added an assertion. Turns out there was a place it was hit. I'm addressing that in #7259, which will need to be merged first.

CI will be highly red until that's merged into here, so I won't be doing further work here until then.

gjoseph92 · 2022-11-04T02:55:01Z

Have a guestimate on how bad performance is

Ran a benchmark on a larger cluster (not with this PR, but still relevant): #7246 (comment)

tl;dr performance impact is hardly measurable.

Alternative to dask#7259. I'm quite torn about which is cleaner. I'm leaning towards this because I think it's even weirder to call `decide_worker_rootish_queuing_disabled` on a root-ish task when queuing is enabled than to call `decide_worker_non_rootish` on a root-ish task. This also feels more consistent with the philosophy of "stick with the original decision". And if root-ish were a static property, this is what would happen.

Tests from dask#7259

This reverts commit e9be596.

This reverts commit 0db898d.

gjoseph92 · 2022-11-05T01:52:49Z

I'm wondering if a better way to solve the consistency issue is to just cache root-ish-ness so we don't have to worry about it changing: #7262

crusaderky · 2022-11-07T14:18:00Z

distributed/scheduler.py

@@ -2014,6 +2014,8 @@ def transition_no_worker_processing(self, key, stimulus_id):
                assert ts in self.unrunnable

            if ws := self.decide_worker_non_rootish(ts):
+                # ^ NOTE: `ts` may actually be root-ish now, but it wasn't when it went
+                # into `no-worker`. `TaskGroup` or cluster size could have changed.


As mentioned in https://github.com/dask/distributed/pull/7259/files#r1015290381 this should be impossible.

All tasks without dependencies are root-ish

1569df3

gjoseph92 added the scheduling label Oct 28, 2022

gjoseph92 self-assigned this Oct 28, 2022

gjoseph92 mentioned this pull request Oct 28, 2022

Round-robin empty workers with queueing enabled #7222

Closed

2 tasks

gjoseph92 added 3 commits October 28, 2022 14:39

fix test_retries

a75d057

This isn't a very good test, since it's weirdly stateful and making assumptions about which worker is selected in an empty cluster

fix distributed/diagnostics/tests/test_graph_layout.py::test_states

1d9fce9

fix `distributed/tests/test_cancelled_state.py::test_executing_cancel…

d20717f

…led_error`

gjoseph92 added 4 commits October 28, 2022 15:02

fix `distributed/tests/test_worker_memory.py::test_pause_executor_man…

841b0f2

…ual`

driveby: async_wait_for

0530ec6

fix `distributed/tests/test_worker_memory.py::test_pause_executor_wit…

bb33863

…h_memory_monitor`

WIP update test_priorities

abf36de

Mostly works except... freeze_batched_send isn't having the intended effect

gjoseph92 added 4 commits October 31, 2022 12:20

Get test_priorities working

9a040d2

fix test_states for queuing disabled

d56f7f3

fix test_retries

55818de

skip round robin for queuing

0e6901b

hopefully will be able to remove this in the future.

gjoseph92 added 2 commits October 31, 2022 15:39

Fix co-assignment when group smaller than cluster

a47ac74

We were ending up with `last_worker_tasks_left` initially as 0, which immediately decremented to -1. Then it was a truthy value, so we reused the worker for the entire task group!

test_profile_plot determinism

1cd75cb

seemed to be relying on round-robin, since it only ever looked at one worker

gjoseph92 changed the title ~~[DNM] All tasks without dependencies are root-ish~~ All tasks without dependencies are root-ish Nov 1, 2022

crusaderky self-requested a review November 2, 2022 15:34

crusaderky reviewed Nov 2, 2022

View reviewed changes

distributed/scheduler.py Outdated Show resolved Hide resolved

crusaderky reviewed Nov 2, 2022

View reviewed changes

distributed/tests/test_priorities.py Outdated Show resolved Hide resolved

crusaderky requested changes Nov 3, 2022

View reviewed changes

Update distributed/tests/test_priorities.py

913b511

Co-authored-by: crusaderky <[email protected]>

crusaderky added 2 commits November 3, 2022 01:18

Merge branch 'main' into no-deps-is-rootish

680c538

Refactor for clarity

4c684a2

crusaderky approved these changes Nov 3, 2022

View reviewed changes

gjoseph92 mentioned this pull request Nov 3, 2022

Remove sortedcontainers #7245

Closed

2 tasks

gjoseph92 mentioned this pull request Nov 3, 2022

[DNM] Queue by default #7191

Closed

2 tasks

fjetter reviewed Nov 3, 2022

View reviewed changes

remove dead round-robin code

c901823

test c.submit doesn't cut in line

3806164

gjoseph92 mentioned this pull request Nov 4, 2022

Handle edge cases between queued and no-worker #7259

Closed

2 tasks

remove test_quiet_cluster_round_robin

45b9bbd

This was referenced Nov 4, 2022

Turn on queuing by default #7213

Closed

Queuing: lowest-memory worker as tiebreaker #7248

Open

gjoseph92 added 4 commits November 4, 2022 18:36

Copy over tests

e9be596

Tests from dask#7259

Revert "Copy over tests"

5985419

This reverts commit e9be596.

Revert "Special-case no-worker tasks that became root-ish"

c556762

This reverts commit 0db898d.

crusaderky reviewed Nov 7, 2022

View reviewed changes

gjoseph92 mentioned this pull request Nov 8, 2022

Tasks which are obviously root tasks not considered rootish #7274

Open

actors aren't rootish

17f1ba4

gjoseph92 mentioned this pull request Feb 8, 2023

Queue up non-rootish tasks if they break priority ordering #7526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All tasks without dependencies are root-ish #7221

All tasks without dependencies are root-ish #7221

gjoseph92 commented Oct 28, 2022 •

edited

Loading

github-actions bot commented Oct 28, 2022 •

edited

Loading

mrocklin commented Oct 31, 2022

gjoseph92 commented Oct 31, 2022

mrocklin commented Oct 31, 2022 via email

gjoseph92 commented Oct 31, 2022

gjoseph92 commented Oct 31, 2022

crusaderky left a comment

gjoseph92 commented Nov 3, 2022

gjoseph92 commented Nov 3, 2022

gjoseph92 commented Nov 3, 2022

gjoseph92 commented Nov 3, 2022

fjetter left a comment

fjetter Nov 3, 2022

gjoseph92 Nov 3, 2022

fjetter Nov 3, 2022

gjoseph92 Nov 3, 2022

fjetter Nov 3, 2022

gjoseph92 Nov 3, 2022

fjetter Nov 3, 2022

gjoseph92 commented Nov 3, 2022

fjetter commented Nov 3, 2022

gjoseph92 commented Nov 4, 2022 •

edited

Loading

gjoseph92 commented Nov 4, 2022

gjoseph92 commented Nov 5, 2022

crusaderky Nov 7, 2022

		@@ -2849,6 +2860,10 @@ async def test_get_worker_monitor_info(s, a, b):
		assert res[w.address]["last_time"] is not None


		@pytest.mark.skipif(

All tasks without dependencies are root-ish #7221

Are you sure you want to change the base?

All tasks without dependencies are root-ish #7221

Conversation

gjoseph92 commented Oct 28, 2022 • edited Loading

github-actions bot commented Oct 28, 2022 • edited Loading

Unit Test Results

mrocklin commented Oct 31, 2022

gjoseph92 commented Oct 31, 2022

mrocklin commented Oct 31, 2022 via email

gjoseph92 commented Oct 31, 2022

gjoseph92 commented Oct 31, 2022

crusaderky left a comment

Choose a reason for hiding this comment

gjoseph92 commented Nov 3, 2022

gjoseph92 commented Nov 3, 2022

gjoseph92 commented Nov 3, 2022

gjoseph92 commented Nov 3, 2022

fjetter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoseph92 commented Nov 3, 2022

fjetter commented Nov 3, 2022

gjoseph92 commented Nov 4, 2022 • edited Loading

gjoseph92 commented Nov 4, 2022

gjoseph92 commented Nov 5, 2022

Choose a reason for hiding this comment

gjoseph92 commented Oct 28, 2022 •

edited

Loading

github-actions bot commented Oct 28, 2022 •

edited

Loading

gjoseph92 commented Nov 4, 2022 •

edited

Loading