-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mp teardown hardening #208
Conversation
tractor/_spawn.py
Outdated
except trio.Cancelled: | ||
# if the reaping task was cancelled we may have hit | ||
# a race where the subproc disconnected before we | ||
# could send it a message to cancel (classic 2 generals) | ||
# in that case, wait shortly then kill the process. | ||
reaping_cancelled = True | ||
|
||
if proc.is_alive(): | ||
with trio.move_on_after(0.1) as cs: | ||
cs.shield = True | ||
await proc_waiter(proc) | ||
|
||
if cs.cancelled_caught: | ||
proc.terminate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary to end this block with an unconditional raise
. Although most cases would work, I suppose some edge cases would break down the line. For example, consider if you needed to directly cancel the top-level nursery of this function, and then check it's cancelled_caught
property (similar to the move_on_after
in this block). This exception handler would discard the cancel and it would appear to never catch any cancels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for detailed explanation @richardsheridan, will definitely add!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The necessary bit is tagged on the last logic block.
tractor/_spawn.py
Outdated
proc.terminate() | ||
|
||
if not reaping_cancelled: | ||
if proc.is_alive(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe we can just rely on the proc being dead here?
I guess it would make more sense to poll and then check?
If that works we can probably drop the flag set and check.
It's clear now that special attention is needed to handle the case where a spawned `multiprocessing` proc is started but then the parent is cancelled before the child can connect back; in this case we need to be sure to kill the near-zombie child asap. This may end up being the solution to other resiliency issues seen around mp with nested process trees too. More testing is needed to be sure. Relates to #84 #89 #134 #146
3a64e64
to
87971de
Compare
@richardsheridan this is more acceptable yah? |
Might solve some of the complaints in any of:
#84 #89 #134 #146
Still want to test enabling the nested spawning tests again to see if we get any improvements.