-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/super catchup stuck at the tip #9132
Conversation
@@ -137,9 +137,17 @@ let tear_down {nodes; states; _} = | |||
Hashtbl.clear states | |||
|
|||
let set_state t (node : Node.t) s = | |||
Hashtbl.decr t.states (Node.State.enum node.state) ; | |||
Hashtbl.update t.states (Node.State.enum node.state) ~f:(function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe refactor these two out into their own functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved as this should prevent getting stuck forever, but we should have some kind of retry logic around getting the transition chain proof for a transition that fails (see #9142)
This PR should fix the snark-coordinator being stuck at catchup bug on devnet2.
What happened is that super-catchup didn't invalidate the cache of catchup blocks if we fail to download the state hashes for the missing blocks. This would cause subsequent catchup stuck on those blocks because we still have caches of those in the system and our system recognize blocks in caches as blocks under processing. So it would wait for those blocks forever.