-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(swarm)!: Remove ConnectionHandler::Error
in favor of CloseReason
#3201
Conversation
Now that we are always boxing the error, it makes more sense to have handlers directly return the most specific error without additional wrapping.
Yes, I agree that this type parameter tends to complicate things, so this is a good change. There is one use-case for the pattern match: To continue supporting this will require some discipline by protocol authors once this PR is merged — we’d need a dedicated error type (like |
But you are just logging it right? That will still be supported and then |
No, “just logging it” wouldn’t need the pattern match — the crucial point is logging the origin of the error. |
Right. Backtraces are stable since |
If we go this path, it would probably be good to define a custom type that can be constructed from anything |
Coming back to the code I linked: having a backtrace wouldn’t solve the problem — log messages preferably are concise and fit on a single line, so displaying a backtrace is not desirable. Parsing a backtrace to match on stack frame names is a rather brittle way of figuring out the single word label to apply in the logging. So I maintain that creating a specific error type for each protocol is the best solution proposed so far. IOW, I’d like to avoid code having to play “who dunnit” when it comes to figuring out why a connection was closed. |
Yeah that is fair. I am just wondering, what the best API is to help users here. |
Barring innovation in the “Rust error handling space” done by libp2p (which I’m not sure would be a great idea) I see three possibilities:
The first is not extensible, the second offer second-class citizenship to externally defined protocol, the third is extensible with the same rules for every participant. It does look a bit ugly, though, and handling would benefit from a macro for writing down the downcasting chains (well, Java devs would immediately feel at home regardless). Still, I’d prefer number three. |
Having the error boxed seems reasonable to me. Just checked to be sure how we handle this, and we currently only even inspect errors returned as explicit handler events, not closing errors. At this point the error the PR removes is not even logged (we most likely should) but a boxed error will work for that matter anyway |
How about the following:
This would make it easy for users to do the right thing, even if they don't create their own type. I think chances are, even if users create their own type, they are not going to include the name of the component in the message. That will lose this detail again if users just log the error which is what I'd want them to do. They may also downcast but ideally not just for logging reasons. |
ConnectionHandler::Error
in favor of Box<dyn Error>
ConnectionHandler::Error
in favor of CloseReason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like this! It makes it clear to protocol authors what is expected of them.
.push_back(ConnectionHandlerEvent::Close(CloseReason::new( | ||
"dcutr-relay", | ||
p, | ||
))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about trait CloseReasonExt
to make this ConnectionHandlerEvent::Close(p.closeReason("dcutr-relay"))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do:
p.into_close_reason_for("dcutr-relay")
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to start bikesheding but what about impl From<(&str, Box<dyn Error + Send + 'static>)> for CloseReason
and have ConnectionHandlerEvent::Close(("dcutr-relay", p).into())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am not a fan of From<(.., ..)>
implementations. I generally tend to avoid to implement From
unless I can provide a function somewhere that accepts impl Into
and thus allows us to avoid .into()
calls somewhere. To me, calling .into
yourself is a bit of a code smell.
|
||
impl error::Error for CloseReason { | ||
fn source(&self) -> Option<&(dyn error::Error + 'static)> { | ||
Some(self.source.as_ref()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps document that logging these errors should follow the .source()
chain, e.g. by using anyhow
and formatting with {:#}
. Or is there a generic helper for implementing alternate Error printing so we can offer it out of the box? (This is the weakest point in Rust for me, probably since I’m spoilt by living on the JVM for so long.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is unfortunately but I whole-heartedly agree that it is really missing from std.
We could offer it on this type-specifically although I think it would just be better to tell users to use anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let’s document it.
This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏 |
This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this simplification Thomas, and with CloseReason
we seem to address Roland's concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the benefit of reduced type complexity with this patch. That said, I don't think it is worth giving up strong typing on the error. I would consider providing a string to identify an error, even though the information that the string carries is already available, a hack.
/// ```rust | ||
/// # use libp2p_swarm::handler::CloseReason; | ||
/// | ||
/// # fn main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the main
function needed? Aren't doc examples wrapped in a main
automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tried but can investigate.
@@ -58,7 +58,6 @@ pub struct ConnectionHandler; | |||
impl crate::handler::ConnectionHandler for ConnectionHandler { | |||
type InEvent = Void; | |||
type OutEvent = Void; | |||
type Error = Void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this type information useful. I.e. I believe there is value in making it impossible at compile time for a protocol to close a connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you articulate and perhaps quantify the value you are seeing?
In my view, type systems are best leveraged on boundaries. Boundaries are where we likely split work between developers (users vs maintainers for example) so they have documentation value and the compiler can ensure invariants.
In this case, the Error
is part of a boundary but the consumer is within our responsibility (Swarm
). Because it is an associated type, we have to cater for it to be anything so we can't really take advantage of the type system here. On the contrary, it makes things more complicated because we have to carry around a type parameter which is a cost in terms of complexity.
In addition to Swarm
, this boundary also touches the implementer of a ConnectionHandler
and the user of Swarm
.
Let's start with the latter. Connections can be closed at any time for a variety of reasons. There is definitely value in getting notified about connections being closed. However, I'd argue that it is unlikely that one would take different codepaths depending on which part of my system actively closed a connection. A type-safe solution is useful if I want the compiler to ensure how and when certain code-paths are being taken. But, if we treat all events about closed connections the same, there is no added value in a type-system driven solution.
Last but not least, we have the implementer of ConnectionHandler
. One may argue that it is a useful safety check to first declare that a protocol may never close a connection to ensure this isn't accidentally introduced later (either by the same dev or as part of a change by someone else). This is a plus for a type-system driven solution although it is important to add that one can still audit fairly easily whether a particular protocol actively closes a connection or not.
An additional negative point is that more moving parts (i.e. number of associated types) makes it more difficult for newcomers to implement a ConnectionHandler
. By how much is difficult to measure but from my own experience, I was quite overwhelmed with this, esp. because the default value for when you generate this trait implementation (type Error = ()
) doesn't compile.
In my view, this boils down to weighing the ease of auditability of protocols for active connection closings vs the complexity this type parameter causes in our codebase.
To me, reducing the complexity in our codebase is a pretty clear winner here. It is difficult to express this in numbers but I'd assume that auditing a protocol for when it closes connections is something that doesn't happen very often. Compared to that, we have 4 maintainers and various active contributers here that routinely come in contact with the parts of this codebase involving this type parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I forgot to add: A pretty substantial cost is that this type appears within SwarmEvent
too which pretty much every user of libp2p
comes in contact with and is clunky to name due to the composition of the error as part of NetworkBehaviour
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the reasoning you lay out above. I do believe there is value in strong typing here, e.g. preventing at compile time that a ConnectionHandler
never closes a connection. That said, I am not sure it warrants the downsides you list above. Long story short, I don't have a strong opinion here. Let's go with your suggestion.
/// Encapsulates the reason for why a specific [`ConnectionHandler`] closed a connection. | ||
#[derive(Debug)] | ||
pub struct CloseReason { | ||
protocol: &'static str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this is a hack. The compiler is already aware of the place where a CloseReason
is instantiated. Passing an additional string seems error prone to me. Is there no way we can leverage the type system to make passing a string unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is a hack and I would like to get rid of it. If we agree above that we want something like this, I am happy to invest more time into this detail to make that better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative as discussed above is to recommend users to implement their own error type if they care about identifying the origin of the connection closure. We can follow this as a best-practice in our codebase.
This is kind of equivalent to what we have now. A user could already specify a generic error like std::io::Error
here and thus essentially drop the information of who closed the connection. This assumes they don't want to rely on the order of how the error is composed as part of derive(NetworkBehaviour)
. I think this is a fair assumption to make given that one could change the order of fields in a NetworkBehaviour
without necessarily generating a compile error.
I tried to push this forward but the issue is actually bigger. Started a discussion here: #3307 (comment) |
My suggestion to move forward here is to:
|
Closing in favor of #3591. |
Description
Previously, a
ConnectionHandler
could define an error type for why it would actively close a connection. This type had to passed through a lot of layers until it finally ended up inSwarmEvent::ConnectionClosed
. Most notable, this type appeared everywhere whereSwarmEvent
is referenced. Due to the composition ofConnectionHandler
s, this type is clunky to name and changes every time you change the composition of your derivedNetworkBehaviour
.Very likely, this error is only logged and remains uninspected in the majority of cases. To simplify the public API, we box up the error. If users still want to inspect it, they can attempt downcasting the error a specific one.
Notes
Opening this as a draft to get some feedback.
Links to any relevant issues
Open Questions
Change checklist