Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-108951: add TaskGroup.cancel() #127214

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

belm0
Copy link
Contributor

@belm0 belm0 commented Nov 24, 2024

Short-circuiting of task groups is a very common, useful, and normal, so make it a first-class operation. The recommended approach to date-- creating a task just to raise an exception, and then catch and suppress the exception-- is inefficient, prone to races, and requires a lot of boilerplate.


📚 Documentation preview 📚: https://cpython-previews--127214.org.readthedocs.build/

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is not a full review, just a couple of questions.

Doc/library/asyncio-task.rst Outdated Show resolved Hide resolved
@@ -414,53 +433,6 @@ reported by :meth:`asyncio.Task.cancelling`.
Improved handling of simultaneous internal and external cancellations
and correct preservation of cancellation counts.

Terminating a Task Group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs make sense for older versions.

Copy link
Contributor

@graingert graingert Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably recommending a backport module on PyPI would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs were just added in September, and backported to 3.13 and 3.12.

It's my understanding that the deletion here wouldn't affect the docs of previous versions.

As for this PR, I'd expected it to be backported as far back as is allowed by policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@belm0 are you interested in applying this change and any previous changes to my taskgroup backport?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new API, so we won't backport it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about backporting to pypi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure. PyPI is off limits :)

Lib/test/test_asyncio/test_taskgroups.py Outdated Show resolved Hide resolved
Lib/test/test_asyncio/test_taskgroups.py Outdated Show resolved Hide resolved
@@ -997,6 +999,69 @@ class MyKeyboardInterrupt(KeyboardInterrupt):
self.assertIsNotNone(exc)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())

async def test_taskgroup_stop_children(self):
async with asyncio.TaskGroup() as tg:
tg.create_task(asyncio.sleep(math.inf))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these tasks should look like this?

async def task(results, num):
    results.append(num)
    await asyncio.sleep(math.inf)
    results.append(-num)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can assert what was in results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular test, I chose a different test approach, which is to wrap in asyncio.timeout().

For the other tests using count, I'm not sure it's much value, since the test code is only a few lines and there is only one possible path through it. So count result of 0, 1, or 2 each have deterministic meaning that's obvious by looking at the code.


with self.assertRaises(ExceptionGroup):
async with asyncio.TaskGroup() as tg:
tg.create_task(raise_exc(tg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if some tasks cancels itself? How would this interact with .stop()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the case where a child task calls stop() on its parent TaskGroup, or something else?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancellations (and thus taskgroup stops) happen when the next await … actually yields to the asyncio loop. Who the caller of the cancel or stop operation is doesn't matter.

Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call it TaskGroup.stop() and not TaskGroup.cancel()? I'd be more in favor of the latter name.

Short-circuiting of task groups is a very common, useful, and normal, so make it a first-class operation.

Any evidence of this statement? I'd like you to write up technical motivation + examples. That will be useful for the docs.

And speaking of the documentation, you should also show some recipies of how this would be used. Like are you supposed to use this API from within the task group async with clause? Or can you pass the task group to some remote task?

I haven't reviewed the actual implementation in detail yet.

@bedevere-app
Copy link

bedevere-app bot commented Nov 25, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@arthur-tacca
Copy link

This doesn't work in the case that the body of the task group throws an exception, as in this code:

    async def test_taskgroup_throw_inside(self):

        class MyError(RuntimeError):
            pass

        should_get_here = False
        try:
            async with asyncio.TaskGroup() as tg:
                tg.create_task(asyncio.sleep(0.1))
                tg.stop()
                self.assertEqual(asyncio.current_task().cancelling(), 1)
                raise MyError
            self.fail()  # <-- reaches here instead of raising ExceptionGroup([MyError()])
        except* MyError:
            self.assertEqual(asyncio.current_task().cancelling(), 0)
            should_get_here = True
        self.assertTrue(should_get_here)

The problem is that the new code in the _aexit() method, if not self._errors: return True, is essentially duplicating the if self._errors test later in the function, but in between self._errors is changed by these two lines:

        if et is not None and not issubclass(et, exceptions.CancelledError):
            self._errors.append(exc)

One option is move these lines earlier, before the if self._parent_cancel_requested statement. Then both tests are checking the same thing. This seems to work.

I'd still suggest my original proposal (see the issue) where you just add a single line return True to the very end of _exit() instead of these changes. This avoids duplicating the test in the first place and avoids changing the control flow and, personally, I find it easier to follow.

As a separate point, I'd suggest that the tests could do with a few more checks that asyncio.current_task().cancelling() is correct, like the ones in the test above in this comment.

@belm0
Copy link
Contributor Author

belm0 commented Nov 26, 2024

@1st1

Why call it TaskGroup.stop() and not TaskGroup.cancel()? I'd be more in favor of the latter name.

I'd also prefer cancel(), but per Guido it would be confusing since such a method would be expected to raise CancelledError, and he suggested stop().

Short-circuiting of task groups is a very common, useful, and normal, so make it a first-class operation.

Any evidence of this statement? I'd like you to write up technical motivation + examples. That will be useful for the docs.

In trio the equivalent is nursery.cancel_scope.cancel(), which has > 1,000 hits on github, despite the unpopularity of trio.

I have years experience developing a non-trivial, production async app, which I've presented at PyCon JP. Anecdotally, I can't imagine how painful and unproductive it would be to not have short circuiting of task groups.

And speaking of the documentation, you should also show some recipies of how this would be used. Like are you supposed to use this API from within the task group async with clause? Or can you pass the task group to some remote task?

All is on the table: stop from within the TaskGroup body, from a child, from some other entity you've passed the bound stop() method to.

@smurfix
Copy link

smurfix commented Nov 26, 2024

I'd also prefer cancel(), but per Guido it would be confusing since such a method would be expected to raise CancelledError,

Well, that's exactly what it does, isn't it? The fact that the cancelled taskgroup catches the CancelledErrors raised by itself doesn't change that. You don't get to wait on taskgroups the way you wait on tasks, thus the exception isn't visible like when you await on a cancelled task, but that's a minor detail IMHO.

Also, trio and anyio already call this operation cancel.

@@ -359,6 +359,14 @@ and reliable way to wait for all tasks in the group to finish.
:meth:`cancel` is idempotent and may be called after the task group has
already exited.

Ways to use :meth:`cancel`:

* call it from the task group body based on some condition or event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you want code examples for all of these?

@belm0 belm0 changed the title gh-108951: add TaskGroup.stop() gh-108951: add TaskGroup.cancel() Nov 30, 2024
@@ -273,3 +280,30 @@ def _on_task_done(self, task):
self._abort()
self._parent_cancel_requested = True
self._parent_task.cancel()

def cancel(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about supporting cancel messages here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked on Gitter, but I'm still unclear about how such a message would be accessed and surfaced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be logged by the task that gets cancelled, or useful in debugging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it as-is and maybe add a message in the follow-up PR; this PR is big enough for the review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked on Gitter, but I'm still unclear about how such a message would be accessed and surfaced.

My $0.02:

  1. Assuming that message gets passed into each task, indeed, those tasks can do something with it (like identifying who cancelled it -- this is a private protocol within an app or library).
  2. If we end up raising CancelledError out of the async with block, the same is true for whoever catches that CancelledError.

@@ -997,6 +997,94 @@ class MyKeyboardInterrupt(KeyboardInterrupt):
self.assertIsNotNone(exc)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())

async def test_taskgroup_cancel_children(self):
Copy link
Contributor

@graingert graingert Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that a race function works, eg there's only one winner

async def test_race_one_winner():

    async def race(*fns):
        outcome = None

        async def run(fn):
            nonlocal outcome
            outcome = await fn()
            tg.stop()

        async with asyncio.TaskGroup() as tg:
            for fn in fns:
                tg.create_task(run(fn))

    event = asyncio.Event()
    record = []
    async def fn_1():
        record.append("1 started")
        await event.wait()
        record.append("1 finished")
        return 1

    async def fn_2():
        record.append("2 started")
        await event.wait()
        record.append("2 finished")
        return 2

    async def trigger_event():
        record.append("3 started")
        event.set()
        await asyncio.sleep(10)
        record.append("3 finished")

    outcome = await race(fn_1, fn_2, trigger_event)
    self.assertEquals(outcome, 1)
    self.assertListEquals(record, ["1 started", "2 started", "3 started", "1 finished"])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should expect only one winner, and vaguely recall Trio guidance against such expectations. I'm not sure such a guarantee is useful in practice, because a task wouldn't cancel a task group until its real work was completed, and there is no way to prevent multiple tasks finishing their work on the same scheduler pass (short of employing locks within the tasks).

Would you clarify your expectation? For example, "for any tasks transitively under a TaskGroup that may call tg.cancel(), only one such task is able to do so".

Copy link
Contributor

@graingert graingert Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they do finish on the same scheduler count only one will resume, so it can call .cancel() on sibling tasks to prevent them finishing

This behaviour is already required by staggered_race, and we want to be able to use a TaskGroup in staggered_race

@graingert
Copy link
Contributor

graingert commented Dec 15, 2024

can you test with eager tasks as well as regular tasks?

I think something like this:

class TestTaskGroupLazy(IsolatedAsyncioTestCase):
    loop_factory = asyncio.EventLoop


class TestTaskGroupEager(TestTaskGroupLazy):
    @staticmethod
    def loop_factory():
        loop = asyncio.EventLoop()
        loop.set_task_factory(asyncio.eager_task_factory)
        return loop

if you find the existing tests fail in eager tasks then probably just add the eager tests for your newly added tests.

# If we wanted to raise an error, it would have been done explicitly
# above. Otherwise, either there is no error or we want to suppress
# the original error.
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this bugfix need backporting to 3.12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of a case where this bug is visible to the user? If it's visible, yes I'd make a separate PR with corresponding test that can be backported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use-case for this code?
If the bug is present -- let's create a separate issue and make a fix with the backport.
Anyway, I don't see how is this change related to adding .cancel() method.

Copy link
Contributor Author

@belm0 belm0 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I don't see how is this change related to adding .cancel() method.

Suppressing exceptions out of the context manager is certainly needed to implement TaskGroup.cancel(). Without it, the following basic test will fail:

async def test_taskgroup_cancel_body(self):
    count = 0
    async with asyncio.TaskGroup() as tg:
        tg.cancel()
        count += 1
        await asyncio.sleep(0)  # <-- CancelledError will leak out of context manager
        count += 1
    self.assertEqual(count, 1)

Note that the change isn't a blanket suppression. Code prior to this return True explicitly raises any exception it wants propagated out of the context manager.

Copy link
Contributor

@graingert graingert Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it's visible in the Traceback returned for external or "native" cancellation. Ie a cancellation that propagates out of asyncio.run, because someone is using 3.10 semantics or waited on a executor future on a pool that was shutdown

# If we wanted to raise an error, it would have been done explicitly
# above. Otherwise, either there is no error or we want to suppress
# the original error.
return True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of a case where this bug is visible to the user? If it's visible, yes I'd make a separate PR with corresponding test that can be backported.

@@ -273,3 +280,30 @@ def _on_task_done(self, task):
self._abort()
self._parent_cancel_requested = True
self._parent_task.cancel()

def cancel(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked on Gitter, but I'm still unclear about how such a message would be accessed and surfaced.

@@ -997,6 +997,94 @@ class MyKeyboardInterrupt(KeyboardInterrupt):
self.assertIsNotNone(exc)
self.assertListEqual(gc.get_referrers(exc), no_other_refs())

async def test_taskgroup_cancel_children(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should expect only one winner, and vaguely recall Trio guidance against such expectations. I'm not sure such a guarantee is useful in practice, because a task wouldn't cancel a task group until its real work was completed, and there is no way to prevent multiple tasks finishing their work on the same scheduler pass (short of employing locks within the tasks).

Would you clarify your expectation? For example, "for any tasks transitively under a TaskGroup that may call tg.cancel(), only one such task is able to do so".

Copy link
Contributor Author

@belm0 belm0 Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copying comment from @graingert

(please make all comments on the code so that there can be a thread and Resolve button)

can you test with eager tasks as well as regular tasks?

I think something like this:

class TestTaskGroupLazy(IsolatedAsyncioTestCase):
    loop_factory = asyncio.EventLoop


class TestTaskGroupEager(TestTaskGroupLazy):
    @staticmethod
    def loop_factory():
        loop = asyncio.EventLoop()
        loop.set_task_factory(asyncio.eager_task_factory)
        return loop

if you find the existing tests fail in eager tasks then probably just add the eager tests for your newly added tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants