Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/super catchup stuck at the tip #9132

Merged
merged 10 commits into from
Jun 29, 2021

Conversation

ghost-not-in-the-shell
Copy link
Contributor

This PR should fix the snark-coordinator being stuck at catchup bug on devnet2.

What happened is that super-catchup didn't invalidate the cache of catchup blocks if we fail to download the state hashes for the missing blocks. This would cause subsequent catchup stuck on those blocks because we still have caches of those in the system and our system recognize blocks in caches as blocks under processing. So it would wait for those blocks forever.

@ghost-not-in-the-shell ghost-not-in-the-shell requested review from a team as code owners June 24, 2021 19:41
@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from develop to release/1.1.6 June 24, 2021 19:41
@@ -137,9 +137,17 @@ let tear_down {nodes; states; _} =
Hashtbl.clear states

let set_state t (node : Node.t) s =
Hashtbl.decr t.states (Node.State.enum node.state) ;
Hashtbl.update t.states (Node.State.enum node.state) ~f:(function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe refactor these two out into their own functions

@lk86 lk86 added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Jun 24, 2021
@lk86 lk86 merged commit c856692 into release/1.1.6 Jun 29, 2021
@lk86 lk86 deleted the fix/super-catchup-stuck-at-the-tip branch June 29, 2021 03:11
Copy link
Member

@imeckler imeckler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as this should prevent getting stuck forever, but we should have some kind of retry logic around getting the transition chain proof for a transition that fails (see #9142)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build-me Add this label to trigger a circle+buildkite build for this branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants