-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with tasks completing on workers after being released and re-submitted #7356
Comments
I'm pretty sure I added more than one test that covers this case. That's the entire reason why we have cancelled/resumed states.
If it is transitioned to no-worker, by definition, the only workers executing the task are dead. Their batched comm is dead and there is no way to come back from this since we removed the worker reconnect. If we received such a message the worker restrictions were changed mid flight. I'm perfectly fine that something goes wrong if somebody messes with internal state (I know
This is even specifically parametrized in the test I added in #7348
IIUC the worst thing happening here is that we'd recompute the key because waiting->memory will just keep the task in waiting and we'd get to it eventually. Not nice but also not a problem for a rare edge case
No, it wouldn't. This is explicitly handled in the stimulus handler for erred tasks distributed/distributed/scheduler.py Lines 4675 to 4676 in 53284cd
a cleaner implementation would be something like #7372 but this should be good enough. I just realize that #7372 is similar to your suggestion 2.) but there are a few caveats on worker side to figure out first to give these kind of guarantees. TLDR From what I understand, all your theoretical problem cases are handled and won't result in a problem unless smbd changes restrictions mid flight. Otherwise, I'm inclined to close this ticket unless there is an actual reproducer |
This fixes `test_deadlock_resubmit_queued_tasks_fast`. The problem: - an "old" TaskState `f-0` is forgotten and removed from `queued` HeapSet. Internally, this removes it from a set, but a weakref is still on the heap - a new TaskState `f-0` is added to the HeapSet. It's pushed onto the heap, but since its priority is higher (newer generation), it comes after the old `f-0` object that's still on the front of the heap - `pop` pops old `f-0` off the heap. The weakref is still alive (for whatever reason). `value in self._data` (the set) is True, because the _hash_ of old `f-0` and new `f-0` are the same (same key). So we return the stale, old TaskState object. Much like `WorkerState`s, there should be exactly 1 `TaskState` instance per task. If there are multiple instances with the same key, they are different tasks, and have different state. xref dask#7372, dask#7356.
#7348 fixed a deadlock where a task gets released by the client, then re-submitted by a client, but before the worker hears about the task being cancelled, it completes it and tells the scheduler.
This fix was specific to the
queued
state, but I'd think that the problem (and possible deadlocks) are broader than that. There's more fundamentally a consistency issue whenever:Some theoretical possibilities (I haven't come up with tests to create them yet, but I imagine they're slight variations of the test added in #7348):
no-worker
instead ofqueued
. Maybe this causes a deadlock?processing
instead ofqueued
. When re-submitted, the scheduler picks a different worker for it. Then, the message from the original worker that the task is finished arrives. This would trigger aRuntimeError
on the scheduler from the task completing on an unexpected worker.waiting
. Then the task completed message arrives, triggering thewaiting->memory
transition @crusaderky just added in Edge and impossible transitions to memory #7205 (the docstring oftransition_waiting_memory
does not mention this case FWIW, so I don't think this is intended, though the behavior would be okay)erred->memory
transition.A couple ways to address this:
client_releases_keys
would send afree-keys
message to workers, and only forget the task once the workers confirm they've forgotten it. (We can transition the task toreleased
immediately, just notforgotten
.) This ensures the scheduler maintains a consistent view of the workers' state.transition_counter
when it was created, to use for disambiguating re-submitted tasks. Send this to workers as part of the task spec. Whenever workers send messages regarding a key, they also include thistransition_counter
value. If the value doesn't match what the scheduler has on record for that task (i.e. it's been forgotten and re-submitted in the interim), we know the message is stale, and we can ignore it and just tell the worker to release that key. (We'd have to be careful for consistency issues though—we don't want to accidentally tell the worker to release a newer version of the key, if it has that too.)cc @fjetter @crusaderky @hendrikmakait
The text was updated successfully, but these errors were encountered: