-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update go-libp2p to 0.12.0 #8015
Update go-libp2p to 0.12.0 #8015
Conversation
9b29b8a
to
273754e
Compare
traceutil.AnnotateError(span, err) | ||
stream.Reset() | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now close the stream for writing after "sending" a metadata request. I'm pretty sure the issue here was due to a bug that I fixed in go-libp2p 0.12.0. Please make sure to test this.
Background:
Protocol negotiation in libp2p is usually lazy. This lets us bundle the first write with the protocol headers. Unfortunately, there was a bug in go-libp2p where calling Close before reading or writing would close the stream for writing without flushing the protocol headers. This has since been fixed.
beacon-chain/p2p/sender.go
Outdated
if baseTopic != RPCMetaDataTopic { | ||
if _, err := s.Encoding().EncodeWithMaxLength(stream, message); err != nil { | ||
traceutil.AnnotateError(span, err) | ||
stream.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we were potentially leaking streams here in the past (although I can't see how this would happen in practice unless we had a local bug)
beacon-chain/sync/error.go
Outdated
@@ -56,6 +55,9 @@ func writeErrorResponseToStream(responseCode byte, reason string, stream libp2pc | |||
log.WithError(err).Debug("Could not generate a response error") | |||
} else if _, err := stream.Write(resp); err != nil { | |||
log.WithError(err).Debugf("Could not write to stream") | |||
} else { | |||
// If sending the error message succeeded, close to send an EOF. | |||
stream.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove explicit calls to close like this if we avoid using reset (design question 1). I'm calling close here to make sure the error message is flushed and written before we abort.
ctx, span := trace.StartSpan(ctx, "sync.rpc") | ||
defer span.End() | ||
span.AddAttributes(trace.StringAttribute("topic", topic)) | ||
span.AddAttributes(trace.StringAttribute("peer", stream.Conn().RemotePeer().Pretty())) | ||
log := log.WithField("peer", stream.Conn().RemotePeer().Pretty()) | ||
// Check before hand that peer is valid. | ||
if s.p2p.Peers().IsBad(stream.Conn().RemotePeer()) { | ||
closeStream(stream, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're walking away, we might as well just reset the stream.
@@ -21,11 +21,6 @@ import ( | |||
func (s *Service) beaconBlocksByRangeRPCHandler(ctx context.Context, msg interface{}, stream libp2pcore.Stream) error { | |||
ctx, span := trace.StartSpan(ctx, "sync.BeaconBlocksByRangeHandler") | |||
defer span.End() | |||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now explicitly closing on success. If we don't close the stream, the main RPC handler will reset it.
@@ -117,6 +112,7 @@ func (s *Service) beaconBlocksByRangeRPCHandler(ctx context.Context, msg interfa | |||
// wait for ticker before resuming streaming blocks to remote peer. | |||
<-ticker.C | |||
} | |||
closeStream(stream, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have the RPC handler always close the stream instead of resetting, we can remove this. Up to you.
log.WithError(err).Debugf("Could not reset stream with protocol %s", stream.Protocol()) | ||
} | ||
}() | ||
defer closeStream(stream, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I never bother calling Reset after Send because Send closes for writing. There are no writes to abort at this point.
273754e
to
c62acad
Compare
In terms of tests/building, someone with a functional build environment will need to sync the deps with bazel. It's not working on my arch machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for doing this and cleaning them up ! Had a few thoughts on the PR
Given that all requests/responses are length delimited, the use of Reset is optional. The upsides of using reset are:
It's often faster than calling close (and flushing).
It lets the other side know that you didn't finish sending your response.
The downside is that the choice between reset/close isn't always obvious and resetting a stream after writing will cause the write to be lost (by design).
If you'd rather just stick with close and avoid reset entirely, I'm happy to rework the patch to do that.
The main reason we were using closing of streams instead of resets in request or responses is because there would be situations where the remote peer wouldn't appropriately close the streams on their end. So the resetted stream would technically still be held in memory and not
cleaned up by the GC. I wouldn't be averse to having only stream resets for request/responses if we can guarantee that all those streams would eventually be properly cleaned up.
This patch makes stream closing async to more closely mimic a TCP connection. However, this can cause issues when a close is immediately followed by a disconnect. I'm currently working around this issue by mimicing the old FullClose (now closeStreamAndWait) where necessary.
If desired, I can go back to synchronously closing everywhere, at the cost of losing out on some potential (maybe) latency improvements.
I think this is fine, as long as we handle cases where we need to disconnect immediately and therefore follow a synchronous pattern, handling stream closing async is fine for our case.
That's strange... A reset stream should be cleaned up immediately. Are you sure this wasn't Close versus FullClose? Previously, if you called Close, the stream was only closed for writing and wouldn't be garbage collected until it an EOF was read. Now, both close and Reset garbage collect the stream immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change! It certainly saves us a good bit of time updating libp2p.
This repo has strict rules on error handling. If these APIs return an error, it must be handled appropriately.
9a26d31
to
6aa01a1
Compare
go-libp2p 0.12.0 made some significant changes to the stream interfaces around stream closing: * Close now closes in both directions and frees the stream. However, unlike FullClose did, it doesn't _wait_ for the remote peer to respond with an EOF. * To close for writing, call CloseWrite (like one would on a TCP connection, etc.). This patch: * Replaces calls to FullClose with Close where appropriate. * Replaces calls to Close with CloseWrite where appropriate. * Removes redundant Close calls. * Calls Reset to where appropriate to indicate that the request/response was aborted. Unlike Close, this will not flush and will not cause the remote peer to read an EOF. Instead, the remote peer will read an ErrReset error. * Ensures we always either close or reset streams. Send wasn't closing the stream on some error paths. * Now that stream closing is async, we explicitly wait for a response when "hanging up" on a peer (so we don't hang up before they receive our response/goodbye message).
6aa01a1
to
c50700d
Compare
} | ||
|
||
// Close stream for writing. | ||
if err := stream.Close(); err != nil { | ||
if err := stream.CloseWrite(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build is complaining about this.
beacon-chain/p2p/sender.go:45:18: stream.CloseWrite undefined (type network.Stream has no field or method CloseWrite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to assume this is some kind of bazel thing. CloseWrite is implemented on the stream interface.
beacon-chain/p2p/testing/p2p.go
Outdated
@@ -101,6 +101,7 @@ func (p *TestP2P) ReceiveRPC(topic string, msg proto.Message) { | |||
|
|||
n, err := p.Encoding().EncodeWithMaxLength(s, msg) | |||
if err != nil { | |||
_ = s.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem appropriate. Why ignore stream.Reset errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- They're not useful. We only call Reset when we run into some other more relevant error. At that point, we don't really care about the stream (other than freeing resources).
- Reset is guaranteed to free resources, even if it returns an error.
- No stream multiplexer actually returns an error on reset. We return errors for completeness, but if
I were to redesign these interfaces from scratch, I probably wouldn't.
@prestonvanloon I rebased on develop and lost your commit. I believe I've revived the relevant parts, but please check. |
@Stebalien All good! I ran gazelle again and it only had 1 thing to update on deps.bzl. Build is still failing from unhandled errors.
|
On the latest version? I'm explicitly ignoring those. |
Ok, it looks like the linter doesn't understand (or intentionally ignores?) the syntax for explicitly dropping a return value. That's really annoying. |
This is by design. Ignoring an error isn't handling the error. You'll have to log it or return it or do something with it. |
_err := stream.Reset() | ||
_ = _err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't great. Why does stream.Reset() return an error if we aren't going to handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns an error just in case there might be something useful to return (and as I said, it probably should never have returned an error in the first place). We can just log it, but that won't actually do anything in practice.
Ah nope this was reset vs FullClose, after a period of time non garbage collected streams would be a non trivial part of our heap which is why we eventually went with a full close. I will test this PR out in runtime and get back to you on it, if there are any issues with it they should be obvious. We changed this a while back, so the particular issue I am referencing might already be patched in master. |
Ah, I think I know what the issue was. We had a bug in yamux where closing an "in flight" (not yet opened) stream could leak it. Reset would just close it immediately but FullClose would end up waiting for the stream to no longer be "in flight" before completely closing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Thanks for doing this, its nice to be able to handle this with just stream resets, makes our code cleaner.
…feat/update-libp2p
2670100
b1f21ea
What type of PR is this?
What does this PR do? Why is it needed?
go-libp2p 0.12.0 made some significant changes to the stream interfaces around stream closing:
This patch:
Design Question 1
Given that all requests/responses are length delimited, the use of
Reset
is optional. The upsides of using reset are:The downside is that the choice between reset/close isn't always obvious and resetting a stream after writing will cause the write to be lost (by design).
If you'd rather just stick with close and avoid reset entirely, I'm happy to rework the patch to do that.
Design Question 2
This patch makes stream closing async to more closely mimic a TCP connection. However, this can cause issues when a close is immediately followed by a disconnect. I'm currently working around this issue by mimicing the old
FullClose
(nowcloseStreamAndWait
) where necessary.If desired, I can go back to synchronously closing everywhere, at the cost of losing out on some potential (maybe) latency improvements.
Other
I'm not sure if you want an issue for this, and/or what kind of issue you're looking for (given that this isn't either a feature or a bug fix). Please advise.