op-node: Improve shutdown behavior #10936

sebastianst · 2023-11-09T17:51:46Z

Add (more) info-logging on shutdown
Revisit order in which components are shut down
- e.g. the we may want to stop sequencing before stopping p2p distribution of blocks
Fix op-node does not respond to SIGINT or SIGTERM #8086

The text was updated successfully, but these errors were encountered:

protolambda · 2023-11-09T19:45:14Z

op-node (and indexer / batcher) don't use BlockOnInterrupts anymore intentionally.

Instead, there's ctx := opio.WithInterruptBlocker(context.Background()) in the main.go to register (attached as retrievable value to the ctx, not immediately affecting any context cancellation) a single interrupt receiver that can then be shared between all CLI functions that run.

The sub/main-commands can then utilize the context either with ctx := opio.CancelOnInterrupt(ctx)(for very basic single-threaded termination) or for larger lifecycles, like op-node, the cliapp.LifecycleCmd will handle it.

The idea here is that instead of attaching a system-signal based blocker to the context, you can also attach an artificial blocker, to fake an interrupt on a CLI app without having to run sub-processes; great for testing and potentially app composition (with mocktimism wrapping CLI apps).

Note that this is also better than we had previously, because the interrupt receiver is registered on the main thread, almost as early as possible, before the CLI app.RunContext is called: we don't risk receiving an unhandled system signal and panicing. And it's the same consistent interrupt-channel, so we don't miss any signal when running one command after the other, nor if we do things like a 2nd-interrupt to request faster less-graceful shutdown (like op-node and op-batcher support).

sebastianst · 2023-11-09T19:49:25Z

Ah nice, thank's for clearing that up, TIL! Will edit to remove the opio suggestion.

sebastianst · 2023-11-10T12:18:22Z

Attempt to fix #8086 by @ajsutton at #8128

mslipper · 2023-11-20T04:36:10Z

Will be closed in #8169 once Adrian updates to the latest op-geth version.

sebastianst · 2023-11-20T10:35:48Z

We should still address the first two bullets independently, which are not fixed by #8169. E.g. we're still sequencing after p2p is shut down and we're missing shutdown logs.

sebastianst · 2024-08-06T10:02:41Z

After going with @anacrolix over the code, we identified one minor improvement that would mostly solve this: in the shutdown path, if sequencing is active, stop sequencing before shutting down the p2p stack.

anacrolix · 2024-08-06T23:40:23Z

I'm currently working on that. I have a potential PR from poking around the interrupt handling too.

anacrolix · 2024-08-07T06:48:27Z

I found a bunch of bugs in the peer discovery while testing this. I'll have a PR for that.

I modified op-node to stop sequencing before closing p2p. However the driver still tries to fetch L2 blocks from p2p after sequencing is stopped, I'm not sure this will work. Is it possible just to stop the driver completely instead? Or even the sequencer, then the driver, then the p2p?

sebastianst · 2024-08-07T11:54:08Z

Wouldn't it still be an improvement to just stop sequencing before shutting down the p2p stack? A sequencer would this way just behave like a normal node during shutdown from that point on. Probably more reordering improvements can be done, as you suggested, e.g. to shut down p2p later.

anacrolix · 2024-08-07T13:11:57Z

I don't quite follow. My testing shows p2p is still used until the driver is shutdown. So if it's okay to stop sequencing, then the driver, then p2p that should work its current form, but then do you really gain anything from shutting down sequencing first when you could just shut down the driver and get both for one?

BlocksOnAChain transferred this issue from another repository Jun 18, 2024

BlocksOnAChain transferred this issue from ethereum-optimism/Node-temp_repo-for-public-issue-migration Jun 18, 2024

BlocksOnAChain added T-protocol Team: changes to node components (op-node, op-reth, etc.) implemented by go/rust/etc. devs A-op-node Area: op-node labels Jun 21, 2024

BlocksOnAChain assigned anacrolix Jul 22, 2024

anacrolix mentioned this issue Aug 20, 2024

Conductor and sequencer p2p refactoring #11455

Merged

anacrolix linked a pull request Aug 28, 2024 that will close this issue

Conductor and sequencer p2p refactoring #11455

Merged

sebastianst closed this as completed in #11455 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-node: Improve shutdown behavior #10936

op-node: Improve shutdown behavior #10936

sebastianst commented Nov 9, 2023 •

edited

Loading

protolambda commented Nov 9, 2023 •

edited

Loading

sebastianst commented Nov 9, 2023

sebastianst commented Nov 10, 2023

mslipper commented Nov 20, 2023

sebastianst commented Nov 20, 2023

sebastianst commented Aug 6, 2024

anacrolix commented Aug 6, 2024

anacrolix commented Aug 7, 2024

sebastianst commented Aug 7, 2024

anacrolix commented Aug 7, 2024

op-node: Improve shutdown behavior #10936

op-node: Improve shutdown behavior #10936

Comments

sebastianst commented Nov 9, 2023 • edited Loading

protolambda commented Nov 9, 2023 • edited Loading

sebastianst commented Nov 9, 2023

sebastianst commented Nov 10, 2023

mslipper commented Nov 20, 2023

sebastianst commented Nov 20, 2023

sebastianst commented Aug 6, 2024

anacrolix commented Aug 6, 2024

anacrolix commented Aug 7, 2024

sebastianst commented Aug 7, 2024

anacrolix commented Aug 7, 2024

sebastianst commented Nov 9, 2023 •

edited

Loading

protolambda commented Nov 9, 2023 •

edited

Loading