Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-node: Improve shutdown behavior #10936

Closed
sebastianst opened this issue Nov 9, 2023 · 10 comments · Fixed by #11455
Closed

op-node: Improve shutdown behavior #10936

sebastianst opened this issue Nov 9, 2023 · 10 comments · Fixed by #11455
Assignees
Labels
A-op-node Area: op-node T-protocol Team: changes to node components (op-node, op-reth, etc.) implemented by go/rust/etc. devs

Comments

@sebastianst
Copy link
Member

sebastianst commented Nov 9, 2023

@protolambda
Copy link
Contributor

protolambda commented Nov 9, 2023

op-node (and indexer / batcher) don't use BlockOnInterrupts anymore intentionally.

Instead, there's ctx := opio.WithInterruptBlocker(context.Background()) in the main.go to register (attached as retrievable value to the ctx, not immediately affecting any context cancellation) a single interrupt receiver that can then be shared between all CLI functions that run.

The sub/main-commands can then utilize the context either with ctx := opio.CancelOnInterrupt(ctx)(for very basic single-threaded termination) or for larger lifecycles, like op-node, the cliapp.LifecycleCmd will handle it.

The idea here is that instead of attaching a system-signal based blocker to the context, you can also attach an artificial blocker, to fake an interrupt on a CLI app without having to run sub-processes; great for testing and potentially app composition (with mocktimism wrapping CLI apps).

Note that this is also better than we had previously, because the interrupt receiver is registered on the main thread, almost as early as possible, before the CLI app.RunContext is called: we don't risk receiving an unhandled system signal and panicing. And it's the same consistent interrupt-channel, so we don't miss any signal when running one command after the other, nor if we do things like a 2nd-interrupt to request faster less-graceful shutdown (like op-node and op-batcher support).

@sebastianst
Copy link
Member Author

Ah nice, thank's for clearing that up, TIL! Will edit to remove the opio suggestion.

@sebastianst
Copy link
Member Author

Attempt to fix #8086 by @ajsutton at #8128

@mslipper
Copy link
Collaborator

Will be closed in #8169 once Adrian updates to the latest op-geth version.

@sebastianst
Copy link
Member Author

We should still address the first two bullets independently, which are not fixed by #8169. E.g. we're still sequencing after p2p is shut down and we're missing shutdown logs.

@BlocksOnAChain BlocksOnAChain transferred this issue from another repository Jun 18, 2024
@BlocksOnAChain BlocksOnAChain transferred this issue from ethereum-optimism/Node-temp_repo-for-public-issue-migration Jun 18, 2024
@BlocksOnAChain BlocksOnAChain added T-protocol Team: changes to node components (op-node, op-reth, etc.) implemented by go/rust/etc. devs A-op-node Area: op-node labels Jun 21, 2024
@sebastianst
Copy link
Member Author

After going with @anacrolix over the code, we identified one minor improvement that would mostly solve this: in the shutdown path, if sequencing is active, stop sequencing before shutting down the p2p stack.

@anacrolix
Copy link
Contributor

I'm currently working on that. I have a potential PR from poking around the interrupt handling too.

@anacrolix
Copy link
Contributor

I found a bunch of bugs in the peer discovery while testing this. I'll have a PR for that.

I modified op-node to stop sequencing before closing p2p. However the driver still tries to fetch L2 blocks from p2p after sequencing is stopped, I'm not sure this will work. Is it possible just to stop the driver completely instead? Or even the sequencer, then the driver, then the p2p?

@sebastianst
Copy link
Member Author

Wouldn't it still be an improvement to just stop sequencing before shutting down the p2p stack? A sequencer would this way just behave like a normal node during shutdown from that point on. Probably more reordering improvements can be done, as you suggested, e.g. to shut down p2p later.

@anacrolix
Copy link
Contributor

I don't quite follow. My testing shows p2p is still used until the driver is shutdown. So if it's okay to stop sequencing, then the driver, then p2p that should work its current form, but then do you really gain anything from shutting down sequencing first when you could just shut down the driver and get both for one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-op-node Area: op-node T-protocol Team: changes to node components (op-node, op-reth, etc.) implemented by go/rust/etc. devs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants