Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PeerDAS metrics #13

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

KatyaRyazantseva
Copy link

@KatyaRyazantseva KatyaRyazantseva commented Sep 18, 2024

This PR introduces a recommended list of metrics for monitoring PeerDAS. The proposed changes attempt to standardize the naming convention for these metrics across client implementations. By adopting these metrics, clients can provide better PeerDAS monitoring, compatibility and simplify the process of building summary dashboards for PeerDAS performance tracking and analysis.

The list is open for discussion. Each client has the opportunity to contribute to it by suggesting additions or disputing existing metrics.

Data column, kzg, custody metrics

  • Number of data column sidecars submitted for processing (counter)
  • Number of data column sidecars verified for gossip (counter)
  • Full runtime of data column sidecars gossip verification (histogram)
  • Total count of reconstructed columns (counter)
  • Time taken to reconstruct columns (histogram)
  • Time taken to compute data column sidecar, including cells, proofs and inclusion proof (histogram)
  • Time taken to verify data column sidecar inclusion proof (histogram)
  • Runtime of single data column kzg verification (histogram)
  • Runtime of batched data column kzg verification (histogram)
  • Total count of columns in custody within the data availability boundary (counter)

Gossipsub metrics

Gossipsub domain metrics should contain topic data_column_sidecar_{subnet_id} as labels.

  • Number of gossip messages sent to each topic (counter)
  • Number of bytes sent to each topic (counter)
  • Number of gossip messages received from each topic including duplicates (counter)
  • Number of bytes received from each topic including duplicates (counter)
  • Number of gossip messages received from each topic deduplicated (counter)
  • Number of bytes received from each topic deduplicated (counter)

Req/Resp metrics

Req/Resp domain metrics should contain protocol ID /eth2/beacon_chain/req/data_column_sidecars_by_root/1/ and /eth2/beacon_chain/req/data_column_sidecars_by_range/1/ as labels.

  • Number of requests sent (counter)
  • Number of requests bytes sent (counter)
  • Number of requests received (counter)
  • Number of requests bytes received (counter)
  • Number of responses sent (counter)
  • Number of responses bytes sent (counter)
  • Number of responses received (counter)
  • Number of responses bytes received (counter)

Comment on lines +72 to +79
| `libp2p_rpc_requests_sent_total` | Counter | Number of requests sent | On propagating an RPC request |
| `libp2p_rpc_requests_bytes_sent_total` | Counter | Number of requests bytes sent | On propagating an RPC request |
| `libp2p_rpc_requests_received_total` | Counter | Number of requests received | On receiving an RPC request |
| `libp2p_rpc_requests_bytes_received_total` | Counter | Number of requests bytes received | On receiving an RPC request |
| `libp2p_rpc_responses_sent_total` | Counter | Number of responses sent | On propagating an RPC response |
| `libp2p_rpc_responses_bytes_sent_total` | Counter | Number of responses bytes sent | On propagating an RPC response |
| `libp2p_rpc_responses_received_total` | Counter | Number of responses received | On receiving an RPC response |
| `libp2p_rpc_responses_bytes_received_total` | Counter | Number of responses bytes received | On receiving an RPC response |
Copy link
Member

@matthewkeil matthewkeil Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about prefixing these with req_resp instead of libp2p_rpc?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason @matthewkeil ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose in case libp2p is swapped out? Sounds ok to me, but we do already have some libp2p_ metrics in here. What do you think of p2p_rpc @matthewkeil?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because the domains prescribed in the spec are "gossipsub" and "req/resp" so thinking that domain specific metrics should follow that convention (similar to how the gossipsub are labeled in this PR).

Both transmission methodologies are p2p so not sure that is a necessary differentiator. Also, at least within our tech stack, both "gossipsub" and "req/resp" are using @libp2p so qualifying one with that prefix and not the other doesnt seem correct (also seems redundant and implementation specific).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a newcomer, I found it confusing that specs refer to req/resp, while metrics use a libp2p prefix. Needed extra explanation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The terms rpc & req/resp have been used interchangeably throughout the implementation and discussions. I feel that req/resp is a bit more precise, though rpc is more commonly used outside of the spec.

In Lighthouse, most of our metrics and code references rpc, and I'd personally prefer to keep it consistent there and also avoid breaking changes to existing metrics. However I think we can probably find ways around this, e.g. metrics relabelling on export (will use more storage) or configure mapping in Grafana, if standardising these metrics provide benefits.


#### Req/Resp metrics

Req/Resp domain metrics should contain protocol ID `/eth2/beacon_chain/req/data_column_sidecars_by_root/1/` and `/eth2/beacon_chain/req/data_column_sidecars_by_range/1/` as labels.
Copy link
Member

@matthewkeil matthewkeil Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we standardize the label name to protocolId or something similar? Only issue I see with that would be naming conventions across languages (ie JS is protocolId and rust would be protocol_id). But would allow a common grafana dashboard to work for all clients.

Same question for topic above in the gossipsub section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewkeil Yup standardizing the label names makes sense - they should be in snake case

Prometheus metrics and label names are written in snake_case

@matthewkeil
Copy link
Member

matthewkeil commented Sep 23, 2024

Question @parithosh regarding implementation, should chunks be considered a "response" (ie each column/block in a request_by_range) or is the whole response, irrespective of blocks/column count received considered "1" response?

Should this be specified here for other implementers?

As a note, I tend to lean that stream/req/resp is 1:1 (ie any number of columns/blocks sent over a single stream counts as a single response)

@parithosh
Copy link
Member

Question @parithosh regarding implementation, should chunks be considered a "response" (ie each column/block in a request_by_range) or is the whole response, irrespective of blocks/column count received considered "1" response?

Should this be specified here for other implementers?

As a note, I tend to lean that stream/req/resp is 1:1 (ie any number of columns/blocks sent over a single stream counts as a single response)

I think we're okay with either approach as long as all clients do it the same way. In general i'd agree that a single req/resp should count as "1", but thinking of a future where the number of columns/blocks are different, then the metric would also mean something else (especially if we're trying to draw some analysis on top of it)? Do other implementers have an opinion here?
cc @g11tech @KatyaRyazantseva @samcm @agnxsh @nalepae @jimmygchen

@matthewkeil
Copy link
Member

matthewkeil commented Sep 24, 2024

Question regarding the "deduplicated" messages. We segregate incoming messages for duplicates and then check for valid/invalid message format at the gossip layer. Should deduplicated messages include invalid (all non-duplicates) in the count or only messages that go through the gossip validation process correctly?

BTW this does not include full app level verification, just gossip protocol validation. After speaking with @wemeetagain, we tend to lean towards not including app level validation as it will greatly simplify the implementation to a single collection point instead of after the beacon-node validation sprinkled across several handlers.

@matthewkeil
Copy link
Member

Would love to get the PR that implements these merged. Any thoughts on the changes or perhaps not making them. Either way is fine but wanted to pop this back up.

@tersec
Copy link

tersec commented Oct 4, 2024

Some general points:

  • at least in Nimbus, the duplicated gossip messages received lives in a separate layer, the libp2p library, which does most of this deduplication, and does not and should not know anything about PeerDAS specifically. So a Nimbus implementaiton of

    • Number of gossip messages received from each topic including duplicates (counter)
    • Number of bytes received from each topic including duplicates (counter)

    would be either misleading or require, for the purposes of a couple of metrics, fairly deep architectural changes that seem hard to in general justify; it's useful to have libp2p exist separately as a not-too-leaky abstraction which does not know about anything from specific Ethereum forks, and does have non-Ethereum users.

  • the Gossipsub metrics and Req/Resp metrics seem unrelated to PeerDAS-per-se; they can apply to PeerDAS, but there's no PeerDAS-specific content to them

  • typically, single/batched verifications are an adaptive set of tactics used by the clients which depends on anticipated prevalence of failures which require rerunning verifications individually, for example, not per se something that's baked into how a protocol is set up. While the notional stated purpose of these is precisely to achieve a unified UX of sorts across CLs, even the happy paths here involve different batching sizes, which again depends on specific tradeoffs of latency/throughput/tolerance to single verification fallback/many other considerations. These might be difficult to interpret within one client over time, as well as across clients.

  • some of the "time taken to ..." metrics in general should be, within their categories, and within a single client [see above regarding batch sizes for example], fairly consistent. I'm not sure how useful having these as metrics rather than externally performed benchmarks is. One PeerDAS column isn't going to take much of a different amount of time than any other to verify, generally, except for environmental factors (load, etc) on the machine running it.

  • for the gossip metrics, there are fairly consistent relationships between number of messages and total bytes. Not sure how useful it is to have both.

@KatyaRyazantseva
Copy link
Author

  • at least in Nimbus, the duplicated gossip messages received lives in a separate layer, the libp2p library, which does most of this deduplication, and does not and should not know anything about PeerDAS specifically. So a Nimbus implementaiton of

    • Number of gossip messages received from each topic including duplicates (counter)
    • Number of bytes received from each topic including duplicates (counter)

    would be either misleading or require, for the purposes of a couple of metrics, fairly deep architectural changes that seem hard to in general justify; it's useful to have libp2p exist separately as a not-too-leaky abstraction which does not know about anything from specific Ethereum forks, and does have non-Ethereum users.

For gossipsub metrics we apply filters by topic and choose specifically "data_column_sidecar" labeled topics. Is it possible in Nimpus implementation?

@tersec
Copy link

tersec commented Oct 5, 2024

  • at least in Nimbus, the duplicated gossip messages received lives in a separate layer, the libp2p library, which does most of this deduplication, and does not and should not know anything about PeerDAS specifically. So a Nimbus implementaiton of

    • Number of gossip messages received from each topic including duplicates (counter)
    • Number of bytes received from each topic including duplicates (counter)

    would be either misleading or require, for the purposes of a couple of metrics, fairly deep architectural changes that seem hard to in general justify; it's useful to have libp2p exist separately as a not-too-leaky abstraction which does not know about anything from specific Ethereum forks, and does have non-Ethereum users.

For gossipsub metrics we apply filters by topic and choose specifically "data_column_sidecar" labeled topics. Is it possible in Nimpus implementation?

The question of which topics one wants to have statistics on the duplicated messages for is question and doesn't affect the basic unavailability of this data to Nimbus. To be more precise, when Nimbus intializes libp2p, it sets a

      historyLength = 6,
      historyGossip = 3,
      fanoutTTL = chronos.seconds(60),
      # 2 epochs matching maximum valid attestation lifetime
      seenTTL = chronos.seconds(int(SECONDS_PER_SLOT * SLOTS_PER_EPOCH * 2)),

These parameters get translated in nim-libp2p ultimately into a message seen cache initialization.

This message seen cache, in nim-libp2p in a way intentionally not visible to the Ethereum client Nimbus, uses a salted hash (in nim-libp2p's case; techniques might vary, but this approach is one typical method) to detect and not send to Nimbus further identical, duplicate messages.

It doesn't matter what topic, if things are operating properly Nimbus simply never sees them.

nim-libp2p does expose some metrics, but they might not line up with what you're looking for, and, to this point, see above, it's not a good design approach to require the libp2p library, which should not care per se about Ethereum and should not have to, because libp2p is not an Ethereum-specific protocol, to use anything PeerDAS-specific.

barnabasbusa pushed a commit to ethpandaops/ethereum-package that referenced this pull request Oct 7, 2024
This PeerDAS metrics dashboard is implemented according to the current
PeerDAS metrics specs. It can be used by all the CL clients which
implemented the set of [these
metrics](ethereum/beacon-metrics#13). The
metrics are still under discussion, so, the dashboard is currently in
development.
Filters: by client, job, supernode.
@matthewkeil
Copy link
Member

This message seen cache, in nim-libp2p in a way intentionally not visible to the Ethereum client Nimbus, uses a salted hash (in nim-libp2p's case; techniques might vary, but this approach is one typical method) to detect and not send to Nimbus further identical, duplicate messages.

It doesn't matter what topic, if things are operating properly Nimbus simply never sees them.

nim-libp2p does expose some metrics, but they might not line up with what you're looking for, and, to this point, see above, it's not a good design approach to require the libp2p library, which should not care per se about Ethereum and should not have to, because libp2p is not an Ethereum-specific protocol, to use anything PeerDAS-specific.

Agreed. This is similar to what drove my question above about "fully validated" messages. Those filtered messages, and invalid ones never make it to Lodestar. To simplify metric collection we did it at the libp2p layer without domain specific validation like bls, etc.

https://github.com/ChainSafe/js-libp2p-gossipsub/pull/508/files

As a note @tersec we are just collecting these in our libp2p-gossipsub layer and adding the topic as a label so realistically we are collecting for every topic and its not "ethereum specific" because you are correct, the libp2p layer should not have any knowledge of the application layer. It just reports metrics on what goes across the wire, agnostic to what it "actually is" other than a message.

This also guided my review of the gossipsub metric names so they were gossip-only and did not refer to application specifics.

@matthewkeil
Copy link
Member

matthewkeil commented Oct 8, 2024

  • the Gossipsub metrics and Req/Resp metrics seem unrelated to PeerDAS-per-se; they can apply to PeerDAS, but there's no PeerDAS-specific content to them

You are correct @tersec. As additional color to the comment above there is a goal to help standardize the way metrics are collected across clients (at several layers of the stack) to help operators and promote client diversity. Building muscle memory around metric naming paradigms and even better allowing for unified dashboards across clients is an overarching dream/wish/goal. While the metrics are in the "ethereum/beacon-metrics" repo there are some places where collection points will need to be agnostic, like in the libp2p-gossipsub layer as an excellent example. These libs across the languages should be application agnostic but still usefully and methodically defined. Perhaps there will be a push outside of eth to standardize the metrics further but for now this is a process in its infancy and realistically most of these libs are maintained by the client teams from what I gather.

  • typically, single/batched verifications are an adaptive set of tactics used by the clients which depends on anticipated prevalence of failures which require rerunning verifications individually, for example, not per se something that's baked into how a protocol is set up. While the notional stated purpose of these is precisely to achieve a unified UX of sorts across CLs, even the happy paths here involve different batching sizes, which again depends on specific tradeoffs of latency/throughput/tolerance to single verification fallback/many other considerations. These might be difficult to interpret within one client over time, as well as across clients.

This is a great point and we do the same thing. We kept our collection at the libp2p-gossipsub layer because of this exact thing and we are not counting "full verifications" because of what you mentioned. We are only counting correctly formatted messages that got through deserialization because the collection after actual gossip handling was too burdensome. There would be collection points in each gossip handler which is infeasible for this scope. See the note about "full verification" in the comment above.

  • some of the "time taken to ..." metrics in general should be, within their categories, and within a single client [see above regarding batch sizes for example], fairly consistent. I'm not sure how useful having these as metrics rather than externally performed benchmarks is. One PeerDAS column isn't going to take much of a different amount of time than any other to verify, generally, except for environmental factors (load, etc) on the machine running it.

We added this timing in the batch verification we run. Was a pretty easy change. We also do 0 single verifications so everything is run as a batch. As for runtime vs bench performance testing I tend to see what you are saying but think that the actual runtime collection is pertinent.

The performance of our node is not linear across the full slot/epoch and understanding how and where the bottlenecks occur can inform protocol development. For instance (while not relevant for this specific topic but still pertinent to the discussion) adding additional work at epoch transition that is heavily memory bound and that triggers extensive garbage collection will affect slot zero proposals and attestations for us. As we add more work to to the gossip validation, will it potentially affect single-slot finality and more importantly, how?

  • for the gossip metrics, there are fairly consistent relationships between number of messages and total bytes. Not sure how useful it is to have both.

Good point. We added it but this may warrant further investigation of usefulness

@matthewkeil
Copy link
Member

@KatyaRyazantseva was there a discussion or decision on the naming of the metric? If not perhaps we go with our impressions and swap the name so its less confusion? Would love to get this one landed so we can update our metrics in ChainSafe/lodestar#7095

@tersec
Copy link

tersec commented Oct 15, 2024

Agreed. This is similar to what drove my question above about "fully validated" messages. Those filtered messages, and invalid ones never make it to Lodestar. To simplify metric collection we did it at the libp2p layer without domain specific validation like bls, etc.

https://github.com/ChainSafe/js-libp2p-gossipsub/pull/508/files

As a note @tersec we are just collecting these in our libp2p-gossipsub layer and adding the topic as a label so realistically we are collecting for every topic and its not "ethereum specific" because you are correct, the libp2p layer should not have any knowledge of the application layer. It just reports metrics on what goes across the wire, agnostic to what it "actually is" other than a message.

This also guided my review of the gossipsub metric names so they were gossip-only and did not refer to application specifics.

If the only practical way to implement this is to add things to the libp2p library, it's very slightly ... misleading ... to specify:

Gossipsub metrics

Gossipsub domain metrics should contain topic data_column_sidecar_{subnet_id} as labels.

It's a way of asking for libp2p modifications for all messages without ever stating such. It should be explicit on this point if so, and that can be discussed in those terms, not specifying this as if it's all about PeerDAS.

Because, as a counterexample to

Perhaps there will be a push outside of eth to standardize the metrics further but for now this is a process in its infancy and realistically most of these libs are maintained by the client teams from what I gather.

One of the main libp2p libraries is very definitely not an "Ethereum" library in disguise; neither are js-libp2p nor rust-libp2p.

In general, it turns this "realistically, most of these libs ..." into an effective requirement, which is, again, beyond what should be the scope of a PR which never says anything like this explicitly in a part of one of the many Ethereum-specific spec processes.

I'd prefer to see some feedback from more libp2p people here. Maybe these metrics are reasonable, maybe not.

But there's a similar tension between

Good point. We added it but this may warrant further investigation of usefulness

and

Would love to get this one landed so we can update our metrics in ChainSafe/lodestar#7095

People can update metrics and create dashboards however they want, without this being standardized.

Proposal:

  • drop the "including duplicates" gossip metrics (libp2p-dependent, and yes, this is a real structural issue);
  • verify the utility of having both the byte and message count for gossip metrics, generally preferring the one with less libp2p-modification dependency;
  • clients and dashboards can implement and use whatever metrics they want, regardless;
  • clarify whether the counts need to include outright invalid messages, because of similar libp2p concerns

Comment on lines +58 to +63
| `gossipsub_topic_msg_sent_counts_total` | Counter | Number of gossip messages sent to each topic | On sending a message over a topic |
| `gossipsub_topic_msg_sent_bytes_total` | Counter | Number of bytes sent to each topic | On sending a message over a topic |
| `gossipsub_topic_msg_recv_counts_unfiltered_total` | Counter | Number of gossip messages received from each topic (including duplicates) | On receiving a message from a topic including duplicates|
| `gossipsub_topic_msg_recv_bytes_unfiltered_total` | Counter | Number of bytes received from each topic (including duplicates) | On receiving a message from a topic including duplicates |
| `gossipsub_topic_msg_recv_counts_total` | Counter | Number of gossip messages received from each topic (deduplicated) | On receiving a message from a topic deduplicated |
| `gossipsub_topic_msg_recv_bytes_total` | Counter | Number of bytes received from each topic (deduplicated) | On receiving a message from a topic deduplicated |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these generic metrics would benefit multi-client monitoring - although it would be difficult for clients to change them if they already exist.

Most of these metrics already exist in rust-libp2p, I wonder if other clients have equivalent metric for these? If it's hard to standardise these, perhaps it would still be useful to find out the equivalent for each client, so we could make it possible to have a single multi-client dashboard without standardising the metric names?

@tersec @nalepae @zilm13 @matthewkeil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw the duplicate tracking metrics would be useful to analyse the effectiveness of IDONTWANT.

Copy link
Member

@matthewkeil matthewkeil Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (Lodestar) are the owners of the js-libp2p-gossipsub repo and have updated those metric names to match this PR. That is still in draft though because this PR has not landed.

I do completely agree that unifying metric names across clients will be huge for both ethpandops and operators as a whole. By making dashboards universal it will be trivial to switch to promote client diversity or to recover if a big bug is found in one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants