Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ping): don't close connections upon failures #3947

Merged
merged 26 commits into from
May 24, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented May 15, 2023

Description

Previously, the libp2p-ping module came with a policy to close a connection after X failed pings. This is only one of many possible policies on how users would want to do connection management.

We remove this policy without a replacement. If users wish to restore this functionality, they can easily implement such policy themselves: The default value of max_failures was 1. To restore the previous functionality users can simply close the connection upon the first received ping error.

In this same patch, we also simplify the API of ping::Event by removing the layer of ping::Success and instead reporting the RTT to the peer directly.

Related: #3591.

Notes & open questions

Patch-by-patch review is recommended.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@thomaseizinger thomaseizinger requested a review from mxinden May 15, 2023 13:03
misc/metrics/CHANGELOG.md Outdated Show resolved Hide resolved
misc/metrics/CHANGELOG.md Show resolved Hide resolved
protocols/ping/CHANGELOG.md Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented May 16, 2023

Do I understand correctly, that we expect all Transport implementations, e.g. TCP and QUIC, to close a malfunctioning connection?

If yes, fine with me to proceed with this pull request.

If not, I would expect all users to want some mechanism along the lines of what libp2p-ping provides today, namely to close malfunctioning connections, and thus I would advocate for keeping the mechanism in-place here.

@thomaseizinger
Copy link
Contributor Author

If not, I would expect all users to want some mechanism along the lines of what libp2p-ping provides today, namely to close malfunctioning connections, and thus I would advocate for keeping the mechanism in-place here.

Define malfunctioning?

I don't think libp2p-ping is a suitable way of identifying a malfunctioning connection:

  1. A remote peer is not guaranteed to support libp2p-ping, hence we cannot rely on it actually operating.
  2. A remote peer may deprioritize ping messages, i.e. not treat them with the highest priority and thus run into a timeout. That doesn't mean that the underlying connection is faulty.
  3. A remote peer may have a bug in the ping implementation but implement other protocols correctly.

I tried to already make a point that equating a working ping with a working connection is a policy and I believe that users should be in charge of policy. Do you not agree with that?

For example, another policy could be to disconnect all peers with a latency higher than 500ms or all that don't have a latency in the 95th percentile of all currently active connections.

Do I understand correctly, that we expect all Transport implementations, e.g. TCP and QUIC, to close a malfunctioning connection?

If they can reliably detect a malfunctioning connection, then yes absolutely.

@thomaseizinger
Copy link
Contributor Author

For example, another policy could be to disconnect all peers with a latency higher than 500ms or all that don't have a latency in the 95th percentile of all currently active connections.

I am happy to add some more docs to libp2p-ping to explain that.

@mergify

This comment was marked as resolved.

@mergify

This comment was marked as resolved.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with proceeding here. Thanks for the details. Just one thing on how the user should close a specific connection.

protocols/ping/CHANGELOG.md Show resolved Hide resolved
@thomaseizinger thomaseizinger requested a review from mxinden May 23, 2023 20:09
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge from my end.

Comment on lines +278 to +279
// Note: For backward-compatibility the first failure is always "free"
// and silent. This allows peers who use a new substream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards compatibility still relevant? This implementation should be compatible with any recent other implementation adhering to the specification, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reword it but the functionality still needs to be there. JS for example uses a new stream per ping and we should not report that as an error.

@mergify mergify bot merged commit 25bc30f into master May 24, 2023
@mergify mergify bot deleted the feat/no-close-connection-ping-failures branch May 24, 2023 12:33
@youngjoon-lee
Copy link

youngjoon-lee commented Jan 6, 2025

@thomaseizinger Thanks for your detailed PR description. I have a short question regarding the best practice of closing connections, as a developer implementing my own NetworkBehaviour.
If I understand correctly, prior to this PR, using ping alongside gossipsub in the same Swarm could result in the connection being closed by ping if a peer fails to respond to a "ping." This, in turn, would make the connection unavailable for gossipsub as well, even if the peer is working well for the gossipsub. Is this one of the reasons why this PR was merged?
(UPDATE: I found #3591 after writing this comment. It seems that what I said is one of the reasons why this PR was needed.)

I'm asking this question, because I'm implementing my own NetworkBehaviour, which requires some connection management policy. For example, if I detect a connection closed, I want to establish a connection with a different peer. Also, if a connection is slow (timeout), I want to close the connection and establish a connection with a different peer. At the beginning, I was thinking about implementing this policy inside my NetworkBehaviour, but I end up finding this PR and become curious whether this is a good approach.
If I want my NetworkBehaviour usuable along with other NetworkBehaviours, it seems that it's better to move the connection management logic to the application level that uses a Swarm. However, at the same time, I'm not sure if it's good to have logic related to my NetworkBehaviour in two places (inside and outside NetworkBehaviour).
I'd like to ask your opinion, though this is not the best place to ask.

@thomaseizinger
Copy link
Contributor Author

I am no longer a maintainer here so take my opinion with a grain of salt: I think it is entirely okay to implement connection management as a network behaviour as long as that is the only thing that this behaviour does.

I like to think of network behaviours as plugins and plugins can implement all kinds of functionality.

The reason this PR was created was because the ping protocol itself (i.e. the spec) doesn't say anything about connection management, it should just measure latency.

An alternative approach would have been to add more config options to ping::Behaviour and thus decouple the policy but that would have just created a long-term maintenance burden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants