-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context
oriented error relay and MsgStream
overruns
#261
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Obvs this needs some unit tests. |
00f97b1
to
3f26914
Compare
This was referenced Nov 28, 2021
Merged
a955b71
to
fb2b759
Compare
Added a test that we need a fix for re: #265. |
af8f54b
to
8a4671a
Compare
Previously we were ignoring a race where the callee an opened task context could enter `Context.open_stream()` before calling `.started(). Disallow this as well as calling `.started()` more then once.
This always triggered the mentioned race condition. We need to figure out the best approach to avoid this case.
8a4671a
to
3f6099f
Compare
Instead of tracking feeder mem chans per RPC dialog, store `Context` instances which (now) hold refs to the underlying RPC-task feeder chans and track them inside a `Actor._contexts` map. This begins a transition to making the "context" idea the primitive abstraction for representing messaging dialogs between tasks in different memory domains (i.e. usually separate processes). A slew of changes made this possible: - change `Actor.get_memchans()` -> `.get_context()`. - Add new `Context._send_chan` and `._recv_chan` vars. - implicitly create a new context on every `Actor.send_cmd()` call. - use the context created by `.send_cmd()` in `Portal.open_context()` instead of manually creating one. - call `Actor.get_context()` inside tasks run from `._invoke()` such that feeder chans are implicitly created for callee tasks thus fixing the bug #265. NB: We might change some of the internal semantics to do with *when* the feeder chans are actually created to denote whether or not a far end task is actually *read to receive* messages. For example, in the cases where it **never** will be ready to receive messages (one-way streaming, a context that never opens a stream, etc.) we will likely want some kind of error or at least warning to the caller that messages can't be sent (yet).
This more formally declares the runtime's remote task startingn API and uses it throughout all the dependent `Portal` API methods. Allows dropping `Portal._submit()` and simplifying `.run_in_actor()` style result waiting to be delegated to the context APIs at remote task `return` response time. We now also track the remote entrypoint "type` as `Context._remote_func_type`.
In preparation for supporting both backpressure detection (through an optional error) as well as control over the msg channel buffer size, add internal configuration flags for both to contexts. Also adjust `Context._err_on_from_remote_msg()` -> `._maybe..` such that it can be called and will only raise if a scope nursery has been set. Add a `Context._error` for stashing the remote task's error that may be delivered in an `'error'` message.
Half of portal API usage requires a 1 message response (`.run()`, `.run_in_actor()`) and the streaming APIs should probably be explicitly enabled for backpressure if desired by the user. This makes more sense in (psuedo) realtime systems where it's better to notify on a block then freeze without notice. Make this default behaviour with a new error to be raised: `tractor._exceptions.StreamOverrun` when a sender overruns a stream by the default size (2**6 for now). The old behavior can be enabled with `Context.open_stream(backpressure=True)` but now with warning log messages when there are overruns. Add task-linked-context error propagation using a "nursery raising" technique such that if either end of context linked pair of tasks errors, that error can be relayed to other side and raised as a form of interrupt at the receiving task's next `trio` checkpoint. This enables reliable error relay without expecting the (error) receiving task to call an API which would raise the remote exception (which it might never currently if using `tractor.MsgStream` APIs). Further internal implementation details: - define the default msg buffer size as `Actor.msg_buffer_size` - expose a `msg_buffer_size: int` kwarg from `Actor.get_context()` - maybe raise aforementioned context errors using `Context._maybe_error_from_remote_msg()` inside `Actor._push_result()` - support optional backpressure on a stream when pushing messages in `Actor._push_result()` - in `_invote()` handle multierrors raised from a `@tractor.context` entrypoint as being potentially caused by a relayed error from the remote caller task, if `Context._error` has been set then raise that error inside the `RemoteActorError` that will be relayed back to that caller more or less proxying through the source side error back to its origin.
Context.started()
Context
oriented error relay and MsgStream
overruns
Keeping it disabled on context open will help with detecting any stream connection which was never opened on one side of the task pair. In that case we can report that there was an overrun **and** a stream wasn't opened versus if the stream is explicitly configured not to use bp then we throw the standard overflow. Use `trio.Nursery._closed` to detect "closure" XD since it seems to be the most reliable way to determine if a spawn call will trigger a runtime error.
A context stream overrun should normally never take place since if a stream is opened (via ``Context.open_stream()``) backpressure is applied on the message buffer (unless explicitly disabled by the ``backpressure=False`` flag) such that an overrun on the receiving task should result in blocking the (remote) sender task (eventually depending on the underlying ``MsgStream`` transport). Here we add a special error message that reports if one side never opened a stream and let's the user know in the overrun error message that they may be trying to push messages to a task that isn't ready to receive them. Further fixes / details: - pop any `Context` at the end of any `_invoke()` task that creates one and registers with the runtime. - ignore but warn about messages received for a context that either no longer exists or is unknown (guarding against crashes by malicious packets in the latter case)
A context method handling all this logic makes the most sense since it contains all the state related to whether the error should be raised in a nursery scope or is expected to be raised by a consumer task which reads and processes the msg directly (via a `Portal` API call). This also makes it easy to always process remote errors even when there is no (stream) overrun condition.
@guilledk @overclockworked64 I think this one is ready to land. There might be one last little test I forgot - gotta check. Oh and we need nooz.. might be a long one 😂 |
guilledk
approved these changes
Dec 7, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
cancellation
SC teardown semantics and anti-zombie semantics
IPC and transport
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stricter semantics around
tractor.Context.open_stream():
and friends:context could enter
Context.open_stream()
before calling.started()
..started()
more then once.StreamOverrun
error that is raised in 2 main cases:Context.open_stream()
is entered on only one side and that side sends more messages then theActor.msg_buffer_size
in which cases the error is send back to the sending side with a special warning if the receive side never entered its own.open_stream()
Context.open_stream(backpressure=False)
is explicitly set by the user in which case the error may show up at anytrio
checkpoint (via embedding in atractor.RemoteActorError
) on the send side if an overrun takes place. In this case user code expected to handle the error if they don't want a send side task crash - hence why this is not the default setting.Tests todo:
.open_stream()
entered before.started()
calledRaces that inspired this discovered as part of pikers/piker#241
Fixes #265