Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent VC from sending expired subscriptions #5296

Closed

Conversation

michaelsproul
Copy link
Member

Issue Addressed

The Lighthouse BN frequently logs:

Feb 26 00:42:11.220 WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(9), slot: Slot(8507009) }, service: attestation_service

Proposed Changes

I believe the issue is due to how the new already_sent mechanism works. It was introduced in:

When we initialise duties we don't make any attempt to avoid creating SubscriptionSlots entries for past slots. As a result, after a restart or a duties re-org we will try to send subscriptions for expired slots. Additionally, the VC will only set already_sent to true once the subscription attempt succeeds, meaning that it will keep trying (and get stuck) when the subscriptions for past slots are rejected by the BN.

There are multiple ways of fixing this. The fix I've implemented (for now) is two-fold:

  • Check whether the last scheduled subscription slot is already in the past in should_send_subscription_at. This means that we should never try sending a subscription if the slot is > duty.slot - 3, i.e. at duty.slot - 2 or later slots. This aligns with the conditions under which the BN will reject subscriptions.
  • Always set already_sent to true after attempting to subscribe. This changes the behaviour in all failure cases, including the failure case for sending a past subscription that the BN rejects. As a result, we rely on the next scheduled subscription to trigger a retry in cases where all BNs errored for some other reason (e.g. connectivity issue). I think this is OK, as subscriptions still have a lot of redundancy built in, and the extra spamming when the BNs are erroring is probably not helpful. If we miss the last scheduled subscription slot for a duty then we won't retry (due to the first change), and this is OK because the BN would likely reject a retry attempt anyway.

Additional Info

Alternatives I considered but haven't tried implementing:

  • We could pass in the current_slot when initialising SubscriptionSlots and exclude duties that are in the past. IMO this is pretty much equivalent to putting the condition in should_send_subscription_at, and the memory savings from not storing a few slots & bools in exceptional cases (after startup, reorgs) are minimal.

@michaelsproul michaelsproul added bug Something isn't working val-client Relates to the validator client binary ready-for-review The code is ready for review v5.1.0 Q2 2024 labels Feb 26, 2024
@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels Feb 26, 2024
Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! 🎉 Just a logging suggestion

// Prevent subscriptions from being sent for duties which are in the past.
// If there are no subscription slots (should be impossible currently) then considering them
// expired is a safe fallback.
let expired = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can log a warn (or have a metric) here for expired == true. If we are bundling this PR with #5305 , we should ideally be getting should_send_subscription_at calls for expired slots only after a restart.
So getting a barrage of warns that are not right after a restart might alert us to a potential issue in duties calculation logic?

@michaelsproul
Copy link
Member Author

Closing in favour of #5305

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v5.1.0 Q2 2024 val-client Relates to the validator client binary work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants