Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - collect bandwidth metrics per transport #4805

Closed
wants to merge 5 commits into from

Conversation

jxs
Copy link
Member

@jxs jxs commented Oct 4, 2023

Issue Addressed

Following the conversation on libp2p/rust-libp2p#3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's window size limitation or if the bottleneck is at the gossipsub level.

Proposed Changes

introduce new quic and tcp bandwidth metric gauges.

cc @mxinden (turned out to be easier, Thomas gave me a hint)

@jxs jxs force-pushed the bandwidth-metrics branch from bce7071 to fe5a407 Compare October 4, 2023 14:25
@jxs jxs force-pushed the bandwidth-metrics branch from fe5a407 to f3ed696 Compare October 4, 2023 14:27
Copy link

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a hack, but neat!

Long term, I would consider a proper solution in rust-libp2p the cleaner solution, especially in case you ever introduce a third transport.

That said, for now, ...

Comment on lines 221 to 225
pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> =
try_create_int_gauge("libp2p_inbound_quic_bytes", "The inbound quic bandwidth over libp2p");

pub static ref INBOUND_LIBP2P_TCP_BYTES: Result<IntGauge> =
try_create_int_gauge("libp2p_inbound_tcp_bytes", "The inbound tcp bandwidth over libp2p");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> =
try_create_int_gauge("libp2p_inbound_quic_bytes", "The inbound quic bandwidth over libp2p");
pub static ref INBOUND_LIBP2P_TCP_BYTES: Result<IntGauge> =
try_create_int_gauge("libp2p_inbound_tcp_bytes", "The inbound tcp bandwidth over libp2p");
pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> =
try_create_int_gauge("libp2p_inbound_bytes", "The inbound bandwidth over libp2p");

Best practice would be to use a metric family for all bandwidth, differentiated by a Prometheus label, e.g. transport="tcp|quic".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense thanks, went ahead and also put the direction into the same metric

@@ -218,13 +218,30 @@ lazy_static! {
/*
* Bandwidth metrics
*/
pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> =
try_create_int_gauge("libp2p_inbound_quic_bytes", "The inbound quic bandwidth over libp2p");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metrics only ever goes up, right? Shouldn't it be a counter instead of a gauge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with counters we have to increase the counter, with gauges we can set the value. I think @jxs has to set the value based on the bandwidth at regular intervals. I'll let him comment tho.

Agree that a counter would be better if we can set it easily.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may use Counter::inner. Note that using a Gauge in a place where a Counter is correct might result in problems e.g. around Grafana's query suggestions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup it was because of @AgeManning's arguments ehe, managed to switch to a Counter and reset and inc_by_value thanks Max, ptal

Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is very helpful.

Agree with Max's comments

Copy link
Member

@jmcph4 jmcph4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typos

beacon_node/lighthouse_network/src/metrics.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/metrics.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/metrics.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/metrics.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/metrics.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/service/utils.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/service/utils.rs Outdated Show resolved Hide resolved
beacon_node/lighthouse_network/src/service/utils.rs Outdated Show resolved Hide resolved
beacon_node/network/src/metrics.rs Outdated Show resolved Hide resolved
beacon_node/network/src/metrics.rs Outdated Show resolved Hide resolved
Copy link

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Again, long term I think a solution in rust-libp2p would be much cleaner. That said, this should provide us with the numbers we need to evaluate QUIC. 👍

Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks good to me. Lets go

@AgeManning
Copy link
Member

Just waiting for imminent deneb merge

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. Networking labels Oct 18, 2023
@jimmygchen
Copy link
Member

let's go!
bors r+

bors bot pushed a commit that referenced this pull request Oct 19, 2023
## Issue Addressed

Following the conversation on libp2p/rust-libp2p#3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's [window size limitation](libp2p/rust-yamux#162) or if the bottleneck is at the gossipsub level.
## Proposed Changes

introduce new quic and tcp bandwidth metric gauges.

cc @mxinden (turned out to be easier, Thomas gave me a hint)
@bors
Copy link

bors bot commented Oct 19, 2023

Pull request successfully merged into unstable.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title collect bandwidth metrics per transport [Merged by Bors] - collect bandwidth metrics per transport Oct 19, 2023
@bors bors bot closed this Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Networking ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants