Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transferring Actor handle between workers makes Actor unreleasable #4936

Closed
gjoseph92 opened this issue Jun 19, 2021 · 0 comments · Fixed by #4937
Closed

Transferring Actor handle between workers makes Actor unreleasable #4936

gjoseph92 opened this issue Jun 19, 2021 · 0 comments · Fixed by #4937

Comments

@gjoseph92
Copy link
Collaborator

When worker A calls gather_dep on an Actor task, it gets sent an Actor handle by worker B where the Actor is running. When that handle is deserialized on worker A, it gets a Client and creates a Future reference holding onto that Actor's key. The scheduler now notes that worker A's Client desires that key.

When the actual user's Client tries to release the Actor, the scheduler notes that worker A's Client still holds a reference to it, so it is not released.

More complex case:

A user submits a task where one of the dependencies is marked as an Actor, like:

with dask.annotate(workers=workers[0]):
    counter = dask.delayed(Counter)()
with dask.annotate(workers=workers[1]):
    intermediate = dask.delayed(lambda c: None)(counter)
with dask.annotate(workers=workers[0]):
    final = dask.delayed(lambda x, c: x)(intermediate, counter)

final.compute(actors=counter, optimize_graph=False)

In this case, the user doesn't even hold a reference to the Actor. But when the final task completes and the scheduler runs _propagate_forgotten to release its dependencies (including Counter), it sees that some Client holds a reference to the Counter, so it doesn't release it—when in fact the client holding the reference is workers[1]'s Actor handle.

This is what's causing test failures in #4925, now that we're more likely to schedule tasks on workers that don't hold any dependencies.

gjoseph92 added a commit to gjoseph92/distributed that referenced this issue Jun 19, 2021
Fixes dask#4936

I don't think this is quite the right implementation.
1) Why does the `worker=` kwarg exist? It doesn't seem to be used. But it should be. Taking the `if worker` codepath would bypass this whole issue.
2) What if a user is using an Actor within a task? In that case, `get_worker` would return a Worker, but we _would_ want to hold a reference to the Actor key (as long as that task was running).

I think a better implementation might be to include in `__reduce__` whether or not the Actor handle should be a weakref or not, basically. And in `Worker.get_data`, construct it such that it is a weakref.
mrocklin pushed a commit that referenced this issue Jul 20, 2021
Fixes #4936

When constructing an Actor handle, if there is a current worker, make our Future a weakref.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant