feat(ping): don't close connections upon failures #3947

thomaseizinger · 2023-05-15T13:03:10Z

Description

Previously, the libp2p-ping module came with a policy to close a connection after X failed pings. This is only one of many possible policies on how users would want to do connection management.

We remove this policy without a replacement. If users wish to restore this functionality, they can easily implement such policy themselves: The default value of max_failures was 1. To restore the previous functionality users can simply close the connection upon the first received ping error.

In this same patch, we also simplify the API of ping::Event by removing the layer of ping::Success and instead reporting the RTT to the peer directly.

Related: #3591.

Notes & open questions

Patch-by-patch review is recommended.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

Mostly adds noise

This simplifies the API and reduces code. The hypothesis is that nobody is really interested in `Pong` events.

misc/metrics/CHANGELOG.md

protocols/ping/CHANGELOG.md

mxinden · 2023-05-16T02:10:50Z

Do I understand correctly, that we expect all Transport implementations, e.g. TCP and QUIC, to close a malfunctioning connection?

If yes, fine with me to proceed with this pull request.

If not, I would expect all users to want some mechanism along the lines of what libp2p-ping provides today, namely to close malfunctioning connections, and thus I would advocate for keeping the mechanism in-place here.

thomaseizinger · 2023-05-16T09:45:14Z

If not, I would expect all users to want some mechanism along the lines of what libp2p-ping provides today, namely to close malfunctioning connections, and thus I would advocate for keeping the mechanism in-place here.

Define malfunctioning?

I don't think libp2p-ping is a suitable way of identifying a malfunctioning connection:

A remote peer is not guaranteed to support libp2p-ping, hence we cannot rely on it actually operating.
A remote peer may deprioritize ping messages, i.e. not treat them with the highest priority and thus run into a timeout. That doesn't mean that the underlying connection is faulty.
A remote peer may have a bug in the ping implementation but implement other protocols correctly.

I tried to already make a point that equating a working ping with a working connection is a policy and I believe that users should be in charge of policy. Do you not agree with that?

For example, another policy could be to disconnect all peers with a latency higher than 500ms or all that don't have a latency in the 95th percentile of all currently active connections.

Do I understand correctly, that we expect all Transport implementations, e.g. TCP and QUIC, to close a malfunctioning connection?

If they can reliably detect a malfunctioning connection, then yes absolutely.

thomaseizinger · 2023-05-16T09:45:53Z

For example, another policy could be to disconnect all peers with a latency higher than 500ms or all that don't have a latency in the 95th percentile of all currently active connections.

I am happy to add some more docs to libp2p-ping to explain that.

…ibp2p/rust-libp2p into feat/no-close-connection-ping-failures

mxinden

Fine with proceeding here. Thanks for the details. Just one thing on how the user should close a specific connection.

protocols/ping/CHANGELOG.md

…ibp2p/rust-libp2p into feat/no-close-connection-ping-failures

mxinden

Good to merge from my end.

mxinden · 2023-05-24T02:41:00Z

protocols/ping/src/handler.rs

+                // Note: For backward-compatibility the first failure is always "free"
+                // and silent. This allows peers who use a new substream


Is this backwards compatibility still relevant? This implementation should be compatible with any recent other implementation adhering to the specification, right?

We could reword it but the functionality still needs to be there. JS for example uses a new stream per ping and we should not report that as an error.

youngjoon-lee · 2025-01-06T08:05:14Z

@thomaseizinger Thanks for your detailed PR description. I have a short question regarding the best practice of closing connections, as a developer implementing my own NetworkBehaviour.
If I understand correctly, prior to this PR, using ping alongside gossipsub in the same Swarm could result in the connection being closed by ping if a peer fails to respond to a "ping." This, in turn, would make the connection unavailable for gossipsub as well, even if the peer is working well for the gossipsub. Is this one of the reasons why this PR was merged?
(UPDATE: I found #3591 after writing this comment. It seems that what I said is one of the reasons why this PR was needed.)

I'm asking this question, because I'm implementing my own NetworkBehaviour, which requires some connection management policy. For example, if I detect a connection closed, I want to establish a connection with a different peer. Also, if a connection is slow (timeout), I want to close the connection and establish a connection with a different peer. At the beginning, I was thinking about implementing this policy inside my NetworkBehaviour, but I end up finding this PR and become curious whether this is a good approach.
If I want my NetworkBehaviour usuable along with other NetworkBehaviours, it seems that it's better to move the connection management logic to the application level that uses a Swarm. However, at the same time, I'm not sure if it's good to have logic related to my NetworkBehaviour in two places (inside and outside NetworkBehaviour).
I'd like to ask your opinion, though this is not the best place to ask.

thomaseizinger · 2025-01-06T08:23:29Z

I am no longer a maintainer here so take my opinion with a grain of salt: I think it is entirely okay to implement connection management as a network behaviour as long as that is the only thing that this behaviour does.

I like to think of network behaviours as plugins and plugins can implement all kinds of functionality.

The reason this PR was created was because the ping protocol itself (i.e. the spec) doesn't say anything about connection management, it should just measure latency.

An alternative approach would have been to add more config options to ping::Behaviour and thus decouple the policy but that would have just created a long-term maintenance burden.

thomaseizinger added 12 commits May 15, 2023 13:49

Handle unreachable case explicitly

6481aa8

ReadyUpgrade cannot time out

a1d9a53

Directly map error to Failure::Other

dcb2256

fixup! ReadyUpgrade cannot time out

61f4b1c

Embed timeout in send_ping

6840638

Exhaustively match

da502da

fixup! Embed timeout in send_ping

3e9d527

Remove Result type alias

6ea4e75

Mostly adds noise

Remove unnecessary path prefixes

ceb7121

Move logging to handler

ceb9899

Remove Pong event

c854561

This simplifies the API and reduces code. The hypothesis is that nobody is really interested in `Pong` events.

Don't close connection

3240725

thomaseizinger requested a review from mxinden May 15, 2023 13:03

jxs approved these changes May 15, 2023

View reviewed changes

thomaseizinger commented May 15, 2023

View reviewed changes

misc/metrics/CHANGELOG.md Outdated Show resolved Hide resolved

misc/metrics/CHANGELOG.md Show resolved Hide resolved

protocols/ping/CHANGELOG.md Outdated Show resolved Hide resolved

Apply suggestions from code review

458b35a

This comment was marked as resolved.

Sign in to view

thomaseizinger added 2 commits May 16, 2023 21:30

Merge branch 'master' into feat/no-close-connection-ping-failures

f0b230c

Merge branch 'feat/no-close-connection-ping-failures' of github.com:l…

45605b0

…ibp2p/rust-libp2p into feat/no-close-connection-ping-failures

This comment was marked as resolved.

Sign in to view

Merge branch 'master' into feat/no-close-connection-ping-failures

b6540fe

mxinden reviewed May 18, 2023

View reviewed changes

protocols/ping/CHANGELOG.md Show resolved Hide resolved

thomaseizinger added 4 commits May 23, 2023 22:00

Merge branch 'master' into feat/no-close-connection-ping-failures

60f250e

Add Swarm::close_connection API

2a910fa

Expose connection ID on event

772c891

Update protocols/ping/CHANGELOG.md

ea7c01c

thomaseizinger requested a review from mxinden May 23, 2023 20:09

thomaseizinger added 2 commits May 23, 2023 22:14

Document health check / connection management with ping

c1fe6a1

Merge branch 'feat/no-close-connection-ping-failures' of github.com:l…

6e12acf

…ibp2p/rust-libp2p into feat/no-close-connection-ping-failures

thomaseizinger mentioned this pull request May 23, 2023

swarm: Remove ConnectionHandler::Error #3591

Closed

mxinden approved these changes May 24, 2023

View reviewed changes

Fix compile errors

ad6a882

thomaseizinger added the send-it label May 24, 2023

thomaseizinger and others added 3 commits May 24, 2023 13:55

Tidy up docs

9c4a656

Fix typo

5c79d30

Merge branch 'master' into feat/no-close-connection-ping-failures

198d164

mergify bot merged commit 25bc30f into master May 24, 2023

mergify bot deleted the feat/no-close-connection-ping-failures branch May 24, 2023 12:33

thomaseizinger mentioned this pull request Aug 31, 2023

ping: Repeated errors emitted from ping behaviour #4410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ping): don't close connections upon failures #3947

feat(ping): don't close connections upon failures #3947

thomaseizinger commented May 15, 2023 •

edited

Loading

mxinden commented May 16, 2023

thomaseizinger commented May 16, 2023

thomaseizinger commented May 16, 2023

This comment was marked as resolved.

This comment was marked as resolved.

mxinden left a comment

mxinden left a comment

mxinden May 24, 2023

thomaseizinger May 24, 2023

youngjoon-lee commented Jan 6, 2025 •

edited

Loading

thomaseizinger commented Jan 6, 2025

		// Note: For backward-compatibility the first failure is always "free"
		// and silent. This allows peers who use a new substream

feat(ping): don't close connections upon failures #3947

feat(ping): don't close connections upon failures #3947

Conversation

thomaseizinger commented May 15, 2023 • edited Loading

Description

Notes & open questions

Change checklist

mxinden commented May 16, 2023

thomaseizinger commented May 16, 2023

thomaseizinger commented May 16, 2023

This comment was marked as resolved.

This comment was marked as resolved.

mxinden left a comment

Choose a reason for hiding this comment

mxinden left a comment

Choose a reason for hiding this comment

mxinden May 24, 2023

Choose a reason for hiding this comment

thomaseizinger May 24, 2023

Choose a reason for hiding this comment

youngjoon-lee commented Jan 6, 2025 • edited Loading

thomaseizinger commented Jan 6, 2025

thomaseizinger commented May 15, 2023 •

edited

Loading

youngjoon-lee commented Jan 6, 2025 •

edited

Loading