[Merged by Bors] - Fix ATX syncer hangs #6137
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The ATX syncer was observed to be hanging when a peer serves an invalid ATX. It counts only specific errors as failed requests, instead of every failed request and never reaches the configured requests limit.
Description
Change atxsyncer to count every failure to get an ATX as a failed request. Eventually it will give up on the invalid ATX after reaching the limit.
💡 Additionally, decreased the default number of retries from 20 to 10 for the ATX syncer.
💡 There is a related problem - the ATX handler should return a
pubsub.ErrValidationReject
for invalid ATXs to drop the malicious peer. It should also be smart enough to bubble up such error fromfetcher.GetAtxs(<deps>)
if the dependency of an ATX (e.g. previous or positioning ATX) turns out to be malicious and cannot be fetched.Test Plan
I updated a test to check retries on any error.
TODO