-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential API changes to help users who don't realize parenting is a full-time job #136
Comments
The challenge here is that we do want to allow for things like custom supervisors, where code inside the nursery takes over for But, it just occurred to me that because implementing a custom supervisor is a pretty unusual thing to be doing, we might be able to define some heuristic that catches most accidental misuses, and then if you intentionally want to block inside the nursery then there's a special flag you can enable. Like maybe it would make sense to have I guess the question then is whether "contains a checkpoint" is a good heuristic for "something is wrong"... thinking out loud here, the cases we want to catch are when someone starts doing arbitrary amounts of work directly inside the nursery. That's almost never the right thing to do because "parenting is a full time job". In trio, if you're doing arbitrary amounts of work then you pretty much have to execute some checkpoints in the process. It's possible to avoid this (e.g. Are there any good reasons to block inside a nursery scope (i.e., does this heuristic have false positives)? There are several tests in trio's test suite that do it, but they're a bit sloppy and mostly just do things like (Note: there's another way of implementing the above idea that's equivalent except for cosmetics: make My feeling for right now is that maybe it's best to wait a bit before making any decisions, since we're still learning about how we want to use |
Note: it occurs to me that if we decide we want these semantics, they'd be pretty simple to implement: it basically just means running the nursery body inside a pre-cancelled cancel scope. (I guess that would mean most nurseries had 3 associated cancel scopes, but oh well, it doesn't harm anything and 2 would be invisible to the user.) Minor complications:
|
A few thoughts:
|
Yeah, the idea would be that writing a custom supervisor is a pretty rare and special thing to be doing, so it's OK if you need to go read about some API you never heard of before and that is trickier to use than the usual one. It could even live in
Right, what I mean is: any given nursery call is either going to look like
I experimented with this originally, but it turns out some extremely common cases require passing around the nursery object! And I'd rather not have two ways to do it if in practice everyone has to learn both. All that said... I also realized what might well be a fatal flaw in this idea :-(. I haven't quite reached the point of implementing high-level helpers for servers (#73), but I think the idiom will be something like async with trio.open_nursery() as nursery:
info, task = await start_tcp_server(nursery, host, port, handler)
print("Listening on {host}:{port}".format(**info)) The main idea here is that For The alternative way to write this to avoid doing async def bootstrap_server(nursery):
info, task = await start_tcp_server(nursery, host, port, handler)
print("Listening on {host}:{port}".format(**info)) # or whatever you want to do after the server is up
async with trio.open_nursery() as nursery:
nursery.spawn(bootstrap_server, nursery) But this seems... annoyingly convoluted? And I don't think it's too risky to call something like |
A few additional thoughts:
|
I had an idea. Simplified nurseriesWe could make a version of nurseries where the code inside the This could even make some simple kinds of supervision patterns easier. For example, the root task ( async def init_task():
async with trio.open_simplified_nursery() as nursery:
try:
return await main_task_fn()
finally:
nursery.cancel_scope.cancel() This could be implemented on top of the current system by having a "simplified" nursery that immediately spawns a background task to watch for crashed children, which runs until the parent task hits Alternatively, if we baked this into the core, it could be even simpler: we could hard-code in the task exit path that when a task exits with an error, we immediately cancel the parent nursery, and that would avoid any need for background tasks etc. It could also do some auto-reaping, which basically would mean, we know that the only thing nurseries actually need to keep around is a list of raised exceptions, so we can throw away the rest of each child task object as it exits. This probably isn't a huge win in practice though because if a task raised an exception then the exception keeps its stack frames in memory, and the task object itself is not much beyond that. ...or, no, wait, it is important, because we have long-running nurseries that have an unbounded number of successful tasks cycle through them, so you can't buffer all those in memory forever. But that would all be doable; probably simpler implementation-wise than what we have now. But. The main problem with this design is that it really is simplified, in the sense that it's strictly less capable than what we have now: you can implement the simplified nursery on top of the original-flavor nursery, but not vice-versa. In particular, original-flavor nurseries allow you to implement arbitrary custom supervision policies, like "restart any tasks that exit", but simplified nurseries cannot; they just have the one hard-coded policy. We need a way to implement custom supervisors. One tempting approach would be to have the "simplified" nursery be the default, and then if you want to implement a custom supervisor, you tell trio "hey I got this" and it disables the auto-reaping. Something like: async with trio.open_nursery() as nursery:
with nursery.custom_supervision:
... An elegant further tweak would be to have the One thing I don't much like about this is that in practice, I think you'd always want to put the The API also makes it look like you could put the So this all argues that we should merge the Internally hopefully these could share code, though? We might also want to change the supervised nursery semantics a bit, like maybe you want the rule that if it hits But if you want to handle supervisor crashes, and you want the put the supervisor in a child task... you have a real chicken-and-egg problem. I don't think there's any way to do it except by specifically nominating the supervisor task when setting up the nursery: async with trio.open_nursery_with_background_supervision(supervisor_afn) as nursery:
... Question: does putting supervisor code in a child task violate the no-secret-background-tasks design principle? It's not as bad as a real implicit background task because there is a clear point where it's spawned, when you're creating the supervised nursery. But concretely I'm thinking of code like this, where from the caller's perspective, you just open the special nursery, and this secretly spawns a task: @asynccontextmanager
async def open_my_special_nursery():
async with trio.open_supervised_nursery() as (nursery, supervisor):
nursery.spawn(supervision_loop, supervisor)
try:
yield nursery
finally:
... do... something... to push any main task exception to supervisor code and otherwise let it know that exiting is ok ...this is also rather tricky, isn't it, with the exception handling and everything. Maybe it really is better to just mandate that custom supervisors claim the job that starts them, so you can't willy-nilly spawn things from the body, and instead write something like # On cancellation, raises Cancelled out of the nursery body, which triggers fallback cleanup
async def run_with_restarts_supervised(*thunks):
restart_table = {}
async with trio.open_supervised_nursery() as (nursery, supervisor):
for thunk in thunks:
task = nursery.spawn(thunk)
restart_table[task] = thunk
async for task_batch in supervisor.monitor:
for task in task_batch:
# obviously in a real version of this we'd want rate limiting etc., this is just a toy
# plus some kind of logging of why the task exited
thunk = restart_table.pop(task)
replacement_task = nursery.spawn(thunk)
restart_table[replacement_task] = thunk
# Usage
await run_with_restarts(thunk1, thunk2) It would be kinda cool though to be able to do async with open_restart_nursery(**config) as nursery:
nursery.spawn(some_fn, *args, restart_limit=10) Oh, and of course, you need the nursery object to be able to do things like I guess the designated task version would look like async def supervisor_afn(nursery, supervisor_controller):
...
async with trio.open_supervised_nursery(supervisor_afn) as nursery:
... ...and maybe there'd be some special way we report issues in the main task to the supervisor function? Not that this makes any sense for a nursery that restarts child tasks, since you can't restart the main task. Are there actually any other use cases for custom supervision logic? Ignore and log errors? Can be done with a simplified nursery + a custom Googling for erlang custom supervisors:
Meh, lots of questions here. I'm going to sleep on it. |
A few more thoughts: The mess in #214, and the way I resolved it by making a bunch of the tests in Here's another idea for API support for custom supervisors: add a There are two issues that I still see, both revolving around the asymmetry between the parent task and the child tasks. First, if we have a supervisor child, then how do we signal to them that the parent task is still inside the nursery block, and how do we signal when it leaves? That block is this weird thing, like a child task but not quite the same. And second, what if someone does want to put their supervision logic inside the parent task's nursery block? The whole point of this thread is that it's convenient to be able to put logic there; the asymmetry is part of the appeal. And supervisor logic is inherently a special "child", so maybe it's natural to want these two kinds of asymmetry to line up? Like I think the example supervisor in the docs is a "race" function that spawns several children and then whichever one finishes first gets returned. This would look like: # supervisor in parent task
async def race(*afns):
async with trio.open_nursery() as nursery:
for afn in afns:
nursery.spawn(afns)
with nursery.supervisor_powers() as s:
async for batch in s.monitor:
for task in batch:
nursery.cancel_scope.cancel()
return s.reap_and_unwrap(task)
# supervisor in child
async def race(*afns):
async def supervise(s):
async for batch in s.monitor:
for task in batch:
s.nursery.cancel_scope.cancel()
return s.reap_and_unwrap(task)
async with trio.open_nursery() as nursery:
supervisor_task = nursery.spawn_supervisor(supervise)
for afn in afns:
nursery.spawn(afn)
supervisor_task.wait()
return supervisor_task.result() Hmm, not sure how representative these are. Both have some trickery, but the first one does seem somewhat nicer. |
There is a lot to unpack here. It seems like a lot of complication comes from moving supervision code into children. Is that a common use case? I haven't done much with custom supervisors but they always imagined them being implemented by parents. I like the idea of simple and let-me-supervise nurseries (or all simple nurseries but one can grab Having slept on this does it still seem like something worth pursuing? |
@merrellb: In current trio, yeah, supervision is always implemented by parents – that's true for custom supervisors (b/c there are race conditions otherwise), and it's true for the default supervisor (the one that runs inside the nursery's A handy rule of thumb for designing low-level system primitives like this is that the user should be able to replicate the default behavior "by hand" (i.e., the system isn't cheating and doing things that you couldn't do if you wanted to). And if having the parent task available to run regular code is so important that we want to rewrite the API to make it possible, then maybe it's important for custom supervisors too! I don't really know what the landscape of custom supervisors is going to look like, so I'm hesitant to cut off options. And I would guess that something like an erlang-style restart-on-failure supervisor API might look like: from trio_supervisor_library import open_restart_nursery
async with open_restart_nursery() as nursery:
await nursery.start_restarted_job(start_http_listener, nursery, "localhost", 80)
await nursery.start_restarted_job(start_https_listener, nursery, "localhost", 443) So notice that here the actual supervision code is hidden inside the special "RestartNursery" object's ...On the other hand, for a restart nursery in particular it makes no sense to put any logic in the parent task, because if it fails you can't exactly restart it. And there certainly are a number of things that get simpler if only the parent task can do the supervision – no need to signal parent task death to the supervisor, no need to keep track of two types of children (supervisor child and regular children), etc. I'm still thinking about it :-). I think the benefits in the non-custom-supervisor case are fairly compelling, but it's tricky to figure out how everything should fit together. |
Another radical idea: dropping supervision/monitoring as a builtin conceptI'm not sure if this is a good idea, but it's certainly a big enough conceptual shift that it deserves headline font :-) What I've started wondering is if we would be better off just dropping the specialized supervision APIs, and the
The idea would be that if you want a notification when a task exits, then call MotivationsFirst, this would certainly avoid all those gnarly API design problems I'm grappling with in the comments above. Second, well. I certainly don't have that much experience writing programs on top of trio, but ... so far I don't think I've encountered a single sitaution where I wanted to use any of the "traditional" task management APIs. Happy eyeballs is a good example -- it requires spawning a collection of tasks that interact in some complicated way, and it turned out to be way simpler to just ignore all that stuff and put my own coordination logic in the task. Or, for the toy async def race(*afns):
q = trio.Queue(1)
async def jockey(afn):
await q.put(await afn())
async with trio.open_nursery() as nursery:
await for afn in afns:
nursery.spawn(jockey, afn)
winner = await q.get()
nursery.cancel_scope.cancel()
return winner ...which is maybe a little bit more complicated than the original version? But it's very straightforward; it only uses obvious things we learn about in the tutorial, instead of bringing in And if one of the goals is "one obvious way to do it"... well, the above code works right now (modulo parenting being a full-time job), so we clearly have two ways to do it. And the dedicated supervision APIs are in some sense the "obvious" one (they're documented as being the way you supervise things!), but they're often worse than doing things "by hand" (the happy eyeballs case is a really compelling example of this), and I don't know any cases where they're much better. It is a little unfortunate that you have to make a little closure and call it; that's getting back towards the kind of callback twistyness that trio is trying to avoid. But eh, that's task spawning, and defining a custom supervisor is going to be somewhat tricky one way or another. (Also, huh, if we do this I think we can move |
I must say you are doing a poor job of convincing me of the potential downsides of this approach :-) I really like your proposal as it seems much more explicit and clean. Although I haven't interacted much with the supervisor in my code, knowing it was back there managing things made trio feel a bit heavier and opaque. The fact that it caused such reasoning headaches with simplified parenting and was the only thing keeping UnboundedQueues out of hazmat seems like gravy. I haven't tried writing it but I imagine supervision could be layered on top of this pretty easily ( |
Heck, I guess we could even move |
start_soon is just a new name for spawn, except it doesn't return the new task (in preparation for python-triogh-136, where we're going to stop emphasizing task objects in the main api) start is a major new feature: it provides a very simple way to start up a long running task, while blocking until it's finished whatever initialization it wants to do. At least... it's simple from the user's point of view. Internally it's quite tricky indeed. The whole _run.py file probably needs some refactoring and splitting up, but this is one of those cases where I think it's best to first get the new functionality working and nailed down, and then we can see what shape the new abstractions should be. Fixes python-triogh-284.
start_soon is just a new name for spawn, except it doesn't return the new task (in preparation for python-triogh-136, where we're going to stop emphasizing task objects in the main api) start is a major new feature: it provides a very simple way to start up a long running task, while blocking until it's finished whatever initialization it wants to do. At least... it's simple from the user's point of view. Internally it's quite tricky indeed. The whole _run.py file probably needs some refactoring and splitting up, but this is one of those cases where I think it's best to first get the new functionality working and nailed down, and then we can see what shape the new abstractions should be. Fixes python-triogh-284.
For python-triogh-136 This commit contains: - the actual deprecations - changes to _run.py to prevent warnings - changes to test_run.py to prevent warnings Still need to clean up everything else.
Another small point: our task tree introspection APIs aren't fully fleshed out, but the old concept was to represent it as a tree of tasks, with nurseries an invisible details. And this was fine; it didn't really make sense for a task to have more than one nursery in it, because once it had one nursery it had to park in So tasks should have |
New API: - nursery.child_tasks - nursery.parent_task - task.child_nurseries - task.parent_nursery (may be None) Rationale: python-trio#136 (comment)
A lot to unpack here. Another data point: I really dislike the idea that a nursery's Given that, I'd rather have NB: instead of explicitly hooking up a supervisor task which replaces some default behavior, I'd rather subclass the nursery (cf. #340 – we don't need an explicit |
I agree that this is surprising the first time you see it. But it's absolutely essential to how Trio works: if the parent task can continue on past the Basically, nurseries are where Trio spends its weirdness budget: they have a weird name and they have weird semantics, but at least they're like... the number one most prominent feature that people have to learn to use Trio. So even if they're surprising the first time you see them, that presumably happened within about 5 minutes of starting to read the tutorial, and after that you can adjust your expectations. And in return we basically get to skip all the fractal weirdness that plagues async libraries, which I think is a fair trade. Obviously I'd rather not have any weirdness at all, but, this is a concurrency library – if the worst our critics can say is it's a bit confusing if you don't look at any docs ever, then I think we're doing pretty darn well :-).
It would be possible to alter So the choice is between having the Obviously, doing one thing is simpler than doing two things. More importantly, if the default is wait, and someone wants cancel+wait, that's trivial, just put in a call to async with open_nursery() as nursery:
...
await nursery.wait() because the code inside the Alternatively we could define Also, blocking waiting for calls to complete is actually very natural from a certain point of view. When I write a function call like And finally, my experience so far is that waiting for all the children is very often useful. I haven't found any cases yet where I'd want to cancel at the end of the Here's one: implementing a two-way bytestream proxy in Trio. I like this example a lot in general, because it's such an obvious use case, yet Trio makes it dramatically simpler than ... well, any other language or library that I know of, period. This proxies from a→b and b→a, returns after both incoming streams are closed, correctly supports cancellation, correctly propagates errors, and it's even made out of independently useful and testable pieces: async def one_way_proxy(source, sink):
while True:
chunk = await source.read_some(MAX_CHUNK_SIZE)
if not chunk:
await sink.send_eof()
return
await sink.send_all(chunk)
async def two_way_proxy(a, b):
async with a, b:
async with trio.open_nursery() as nursery:
nursery.start_soon(one_way_proxy, a, b)
nursery.start_soon(one_way_proxy, b, a) How would you write this?
You may find #147 of interest too. This is definitely an important use case, but note that you probably also want to immediately stop accepting new connections, and notify any connections to shut down soon -- e.g. HTTP keepalive should be disabled -- it's a bit tricky. In any case it seems obvious that this will need some sort of non-default handling no matter what the nursery's default behavior is, so I don't think it tells us much about what that default behavior should be?
The reason we have Also subclassing is, like... bad. |
- Parenting is no longer a full time job - This commit also removes a bunch of deprecated functions Fixes: python-triogh-136
- Parenting is no longer a full time job - This commit also removes a bunch of deprecated functions Fixes: python-triogh-136
Came across an article which is somewhat related to the discussion here: http://250bpm.com/blog:71 The first part defines "structured concurrency". Trio already strictly enforces this property (except the pass-nursery-to-child trick which is clearly documented to violate it -- the author of the article also describes a way to violate the property, by analogy to stack vs. heap) so this is already familiar. The second part discusses cancellation. The author chose to go with "exiting scope cancels all tasks with a graceful cancellation mechanism" rather than Trio's "exiting scope waits for all tasks" approach. Though this might just be due to the Author's starting point (golang-inspired) or their choice of programming language (C). |
@bluetech omg, what an amazing find. I've seen other posts on that blog, but missed that one. And it's eerie how much of what he's saying overlaps with concepts I came up with independently, a year later... guess I have some reading to do :-) Another interesting parallel I recently learned about is that the Rust library Rayon has a construct called |
Introduce `await nh2.mock.expect_connect(host, port)` which prepares nh2 for an upcoming `await nh2.connection.Connection(host, port)`. See #1. Also introduce a new nh2.anyio_util module, which creates async-friendly pipes and adds a `create_task_group` that essentially undoes python-trio/trio#136 (comment) 🙁. A followup will probably modify nh2.mock.MockServer.read to accept something like `lowlevel=False` or `format=…` to replace the YAML-ish raw h2 event format with something like just 'GET https://example.com/dummy'. It might also still make sense for a followup to intercept the call to self.c.receive_data in nh2.connection.Connection.read, just to make the raw h2 events it receives assertable too. (I almost, almost, almost renamed nh2.mock to nh2.testing, but I couldn't stomach how that interfered with Tab completion for test_* files. I also almost renamed nh2.mock to nh2._testing and nh2._pytest_plugin._connection_mock to nh2_mock and just had it return the nh2._testing module, so tests would use nh2_mock.expect_connect (etc.), but I think that would tie things to pytest a little too tightly.)
I've managed to block in a nursery scope several times (eg "while True:" loops) while learning trio, not realizing I had not hit
__aexit__
. For long-running processes, this will appear to succeed as we don't expect to exit until KeyboardInterrupt. However, this will have disabled all of the reaping machinery in__aexit__
and will be silently accumulating zombie tasks indefinitely.The documentation is fairly clear on this so I am hesitant to suggest the need to add idiot-proofing, but if there is an easy way to detect the nursery is running but not in
__aexit__
it might prevent some grief.The text was updated successfully, but these errors were encountered: