You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When worker A calls gather_dep on an Actor task, it gets sent an Actor handle by worker B where the Actor is running. When that handle is deserialized on worker A, it gets a Client and creates a Future reference holding onto that Actor's key. The scheduler now notes that worker A's Client desires that key.
When the actual user's Client tries to release the Actor, the scheduler notes that worker A's Client still holds a reference to it, so it is not released.
More complex case:
A user submits a task where one of the dependencies is marked as an Actor, like:
In this case, the user doesn't even hold a reference to the Actor. But when the final task completes and the scheduler runs _propagate_forgotten to release its dependencies (including Counter), it sees that some Client holds a reference to the Counter, so it doesn't release it—when in fact the client holding the reference is workers[1]'s Actor handle.
This is what's causing test failures in #4925, now that we're more likely to schedule tasks on workers that don't hold any dependencies.
The text was updated successfully, but these errors were encountered:
gjoseph92
added a commit
to gjoseph92/distributed
that referenced
this issue
Jun 19, 2021
Fixesdask#4936
I don't think this is quite the right implementation.
1) Why does the `worker=` kwarg exist? It doesn't seem to be used. But it should be. Taking the `if worker` codepath would bypass this whole issue.
2) What if a user is using an Actor within a task? In that case, `get_worker` would return a Worker, but we _would_ want to hold a reference to the Actor key (as long as that task was running).
I think a better implementation might be to include in `__reduce__` whether or not the Actor handle should be a weakref or not, basically. And in `Worker.get_data`, construct it such that it is a weakref.
When worker A calls
gather_dep
on an Actor task, it gets sent an Actor handle by worker B where the Actor is running. When that handle is deserialized on worker A, it gets a Client and creates a Future reference holding onto that Actor's key. The scheduler now notes that worker A's Client desires that key.When the actual user's Client tries to release the Actor, the scheduler notes that worker A's Client still holds a reference to it, so it is not released.
More complex case:
A user submits a task where one of the dependencies is marked as an Actor, like:
In this case, the user doesn't even hold a reference to the Actor. But when the
final
task completes and the scheduler runs_propagate_forgotten
to release its dependencies (includingCounter
), it sees that some Client holds a reference to theCounter
, so it doesn't release it—when in fact the client holding the reference isworkers[1]
's Actor handle.This is what's causing test failures in #4925, now that we're more likely to schedule tasks on workers that don't hold any dependencies.
The text was updated successfully, but these errors were encountered: