-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network/litep2p: Investigate beefy ran out of peers to request justif #4985
Comments
Might also be related to: |
The issue happens when beefy submits a justification request for a block number. In this case, we are only trying to make the requests to peers that have voted on a block higher than the block number. If there are no peers that voted on a higher block, beefy emits a debug log: Lines 147 to 158 in 1d1837c
If we submit a request and that request later on fails, beefy tries to get the next cached peer that voted on a higher block number. If there's no available peers, this time an warning is emitted: Lines 246 to 263 in 1d1837c
From inspecting the metrics of 2 nodes (green litep2p and yellow libp2p), the nodes are performing similarly in terms not finding peers for the requests. ![]() Don't believe this issue is related to the connectivity of the node to the network, most likely to the fact that we are connected to peers that don't suffice the justification query. |
This PR increments the beefy metric wrt no peers to query justification from. The metric is incremented when we submit a request to a known peer, however that peer failed to provide a valid response, and there are no further peers to query. While at it, add a few extra details to identify the number of active peers and cached peers, together with the request error Part of: - #4985 - #4925 --------- Signed-off-by: Alexandru Vasile <[email protected]>
This PR improves the metrics reported by litep2p on request-response errors. Discovered while investigating: - #4985 We are experiencing many requests that are `Refused` by litep2p in comparison with libp2p. The metric roughly approximates the sum of other reasons from libp2p. This PR aims to provide more insights. ``` {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/2", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3365 Min: 3363 Max: 3365 Mean: 3365 {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/beefy/justifications/1", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3461 Min: 3461 Max: 3461 Mean: 3461 ``` Part of: - #4681 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>
This PR increments the beefy metric wrt no peers to query justification from. The metric is incremented when we submit a request to a known peer, however that peer failed to provide a valid response, and there are no further peers to query. While at it, add a few extra details to identify the number of active peers and cached peers, together with the request error Part of: - paritytech#4985 - paritytech#4925 --------- Signed-off-by: Alexandru Vasile <[email protected]>
…h#5077) This PR improves the metrics reported by litep2p on request-response errors. Discovered while investigating: - paritytech#4985 We are experiencing many requests that are `Refused` by litep2p in comparison with libp2p. The metric roughly approximates the sum of other reasons from libp2p. This PR aims to provide more insights. ``` {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/sync/2", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3365 Min: 3363 Max: 3365 Mean: 3365 {__name__="substrate_sub_libp2p_requests_out_failure_total", chain="ksmcc3", instance="localhost:9615", job="substrate_node", protocol="/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/beefy/justifications/1", reason="Remote has closed the substream before answering, thereby signaling that it considers the request as valid, but refused to answer it."} Last *: 3461 Min: 3461 Max: 3461 Mean: 3461 ``` Part of: - paritytech#4681 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>
Looking at the performance triage from:
After inspecting the error details that was added in: We can deduce that the litep2p req-resp produced a
Next Steps
Detailed Logs
|
This PR adds a new beefy metric to monitor the number of live beefy peers. Part of investigation of litep2p request failures: #4985 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>
This PR adds a new beefy metric to monitor the number of live beefy peers. Part of investigation of litep2p request failures: #4985 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>
Providing a bit more details here:
Line 245 in b9b7331
Dumb Question: Is it normal for beefy on demand justification to enter an Metrics
|
Not sure what “enter idle state” means exactly, but it is normal for a BEEFY node to stop sending on demand justifications requests once it is synced up and following latest consensus messages. |
A long-running litep2p node generates this warning:
A similar libp2p node was started side by side, which does not present this warning.
This indicates the issue is specific to litep2p backend.
The warning is coming from:
polkadot-sdk/substrate/client/consensus/beefy/src/communication/request_response/outgoing_requests_engine.rs
Lines 248 to 256 in 01e0fc2
The
try_next_peer
uses a sharedknown_peers
between the beefyOnDemandJustificationsEngine
and theGossipValidator
.The known peers are added and removed via:
polkadot-sdk/substrate/client/consensus/beefy/src/communication/peers.rs
Lines 55 to 63 in 01e0fc2
The peers are added on
validate_vote
andvalidate_finality_proof
.Could be related to:
The text was updated successfully, but these errors were encountered: