-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ctx debugger #220
Ctx debugger #220
Conversation
Ok so same results as #217 so we're approaching some kind of sanity. This somehow means some patch from the bidir streaming, which is causing the windows run to hang, is somehow fixed by some debugger related patch that uses the new context/bidir streaming apis? This is extremely strange since the debugger nor the tests for it should not be getting triggered at all on windows 🤯 Gonna start spelunking in diffs again and likely put up a diff branch to basically |
Yeah I honestly feel like I'm losing my mind..
-> The CI without this patch set hangs indefinitely. This is like the problem of evil for testing.. |
fbcd253
to
929b6dc
Compare
a01d7ff
to
b24477f
Compare
a35b938
to
240f591
Compare
A context is the natural fit (vs. a receive stream) for locking the root proc's tty usage via it's `.started()` sync point. Simplify the `_breakpoin()` routine to be a simple async func instead of all this "returning a coroutine" stuff from before we decided that `tractor.breakpoint()` must be async. Use `runtime` level for locking logging making it easier to trace.
If the root calls `trio.Process.kill()` on immediate child proc teardown when the child is using pdb, we can get stdstreams clobbering that results in a pdb++ repl where the user can't see what's been typed. Not killing such children on cancellation / error seems to resolve this issue whilst still giving reliable termination. For now, code that special path until a time it becomes a problem for ensuring zombie reaps.
Finally this makes a cancelled root actor nursery not clobber child tasks which request and lock the root's tty for the debugger repl. Using an edge triggered event which is set after all fifo-lock-queued tasks are complete, we can be sure that no lingering child tasks are going to get interrupted during pdb use and tty lock acquisition. Further, even if new tasks do queue up to get the lock, the root will incrementally send cancel msgs to each sub-actor only once the tty is not locked by a (set of) child request task(s). Add shielding around all the critical sections where the child attempts to allocate the lock from the root such that it won't be disrupted from cancel messages from the root after the acquire lock transaction has started.
We may get multiple re-entries to debugger by `bp_forever` sub-actor now since the root will incrementally try to cancel it only when the tty lock is not held.
fast fail test with a context. | ||
ensure the partially initialized sub-actor process | ||
doesn't cause a hang on error/cancel of the parent | ||
nrusery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo nursery.
tests/test_debugger.py
Outdated
# NOTE: previously since we did not have clobber prevention | ||
# in the root actor this final resume could result in the debugger | ||
# tearing down since both child actors would be cancelled and it was | ||
# unlikely that `bp_forever` would re-acquire the tty loack again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo loack
# spawn both actors | ||
portal = await n.run_in_actor(key_error) | ||
|
||
# XXX: originally a bug causes by this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos causes
tests/test_debugger.py
Outdated
@@ -397,7 +423,7 @@ def test_multi_nested_subactors_error_through_nurseries(spawn): | |||
child = spawn('multi_nested_subactors_error_up_through_nurseries') | |||
|
|||
# startup time can be iffy | |||
time.sleep(1) | |||
# time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop this?
tractor/_debug.py
Outdated
async def _acquire_debug_lock( | ||
uid: Tuple[str, str] | ||
) -> AsyncIterator[trio.StrictFIFOLock]: | ||
'''Acquire a actor local FIFO lock meant to mutex entry to a local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/to/for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use breakpoint instead of entry point
Clone of #217 but with commits factored to pair with #219.
Again trying to find the source commits that breaks windows CI...
In theory this should have the same CI results as #217 but who knows any more 😂