-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unexpected Marking peer disconnected in DHT
#6140
Fix unexpected Marking peer disconnected in DHT
#6140
Conversation
6227d9f
to
ff1361f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Akihito, thanks! Left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after yesterday's call where @jimmygchen and @AgeManning mentioned that this was occurring on a testnet with only Ligthouse clients I went to research this further.
It turns out that The connection limits NetworkBehaviour
currently forbids us to have more than 1 connection per peer.
The error returned from the Swarm
when a NetworkBehaviour
rejects an outbound connection is DialError::Denied
, so this PR addresses the issue it aims to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, in that is masks the error.
However the deeper problem is that we are trying to connect to a peer twice.
It seems to me the easiest solution is that mask this error and force sequential connections when dialing. So this PR and a future one together is the easiest solution I think
@ackintosh - I think you just need to merge lastest unstable to fix CI |
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 62a39af |
Issue Addressed
We get the log below when LH fails to dial a peer:
I noticed that the disconnect happens unexpectedly in the following case:
Lighthouse dials to a peer twice using TCP and QUIC (if QUIC is not disabled). Usually, one establishes a connection, and the other fails because the peer allows only one connection per peer. I think we shouldn't
Marking peer disconnected in DHT
since a TCP (in this case) connection has been established between the nodes.Proposed Changes
I added a check to see if there’s an active connection before the disconnect.