You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have seen a few cases where we are are unable to publish attestations due to lack of peers on subnets. The failed attestations in the following graph indicate this.
However, the metrics indicate that we have peers on the subnet, so these failures shouldn't be occurring.
If it's just our report of the metrics, then that's not too bad, however if our metrics are accurate, then there is a bug. If our metrics are wrong and we are using these numbers to balance our peers per subnet, then this also isn't great.
I suspect the metrics, but it needs investigation (which i'll attempt to do, just making this issue for visibility)
The text was updated successfully, but these errors were encountered:
I've had more of a look into this.
I wrote a modified binary to run on this node that was exhibiting this behaviour. The modified code checked the metrics against the connected_peers mapping inside gossipsub. It has not reported any kind of mismatch.
I think this means that the metric is accurate, which is concerning, because it indicates that we are not sending messages to peers that we are connected to that are subscribed to a subnet.
I couldn't find why, it requires more investigation. One legitimate reason is that the peers we are connected to are scored poorly such that we don't publish to them. However, I find this highly unlikely, I checked the peer scoring metric and it didn't indicate this, so I think its a safe assumption that this is not the case.
The question is, why is recipient_peers empty (we receive an InsufficientPeers) error, when we are relatively confident that connected_peers contains peers that are subscribed to the topic we want to publish on?
I didn't see any obvious bug on my first pass of this.
Description
We have seen a few cases where we are are unable to publish attestations due to lack of peers on subnets. The failed attestations in the following graph indicate this.
However, the metrics indicate that we have peers on the subnet, so these failures shouldn't be occurring.
If it's just our report of the metrics, then that's not too bad, however if our metrics are accurate, then there is a bug. If our metrics are wrong and we are using these numbers to balance our peers per subnet, then this also isn't great.
I suspect the metrics, but it needs investigation (which i'll attempt to do, just making this issue for visibility)
The text was updated successfully, but these errors were encountered: