Transferring Actor handle between workers makes Actor unreleasable #4936

gjoseph92 · 2021-06-19T02:55:58Z

When worker A calls gather_dep on an Actor task, it gets sent an Actor handle by worker B where the Actor is running. When that handle is deserialized on worker A, it gets a Client and creates a Future reference holding onto that Actor's key. The scheduler now notes that worker A's Client desires that key.

When the actual user's Client tries to release the Actor, the scheduler notes that worker A's Client still holds a reference to it, so it is not released.

More complex case:

A user submits a task where one of the dependencies is marked as an Actor, like:

with dask.annotate(workers=workers[0]):
    counter = dask.delayed(Counter)()
with dask.annotate(workers=workers[1]):
    intermediate = dask.delayed(lambda c: None)(counter)
with dask.annotate(workers=workers[0]):
    final = dask.delayed(lambda x, c: x)(intermediate, counter)

final.compute(actors=counter, optimize_graph=False)

In this case, the user doesn't even hold a reference to the Actor. But when the final task completes and the scheduler runs _propagate_forgotten to release its dependencies (including Counter), it sees that some Client holds a reference to the Counter, so it doesn't release it—when in fact the client holding the reference is workers[1]'s Actor handle.

This is what's causing test failures in #4925, now that we're more likely to schedule tasks on workers that don't hold any dependencies.

The text was updated successfully, but these errors were encountered:

Fixes dask#4936 I don't think this is quite the right implementation. 1) Why does the `worker=` kwarg exist? It doesn't seem to be used. But it should be. Taking the `if worker` codepath would bypass this whole issue. 2) What if a user is using an Actor within a task? In that case, `get_worker` would return a Worker, but we _would_ want to hold a reference to the Actor key (as long as that task was running). I think a better implementation might be to include in `__reduce__` whether or not the Actor handle should be a weakref or not, basically. And in `Worker.get_data`, construct it such that it is a weakref.

Fixes #4936 When constructing an Actor handle, if there is a current worker, make our Future a weakref.

gjoseph92 mentioned this issue Jun 19, 2021

Actor: don't hold key references on workers #4937

Merged

3 tasks

mrocklin closed this as completed in #4937 Jul 20, 2021

mrocklin pushed a commit that referenced this issue Jul 20, 2021

Actor: don't hold key references on workers (#4937)

f28c719

Fixes #4936 When constructing an Actor handle, if there is a current worker, make our Future a weakref.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transferring Actor handle between workers makes Actor unreleasable #4936

Transferring Actor handle between workers makes Actor unreleasable #4936

gjoseph92 commented Jun 19, 2021

Transferring Actor handle between workers makes Actor unreleasable #4936

Transferring Actor handle between workers makes Actor unreleasable #4936

Comments

gjoseph92 commented Jun 19, 2021