[1.0] Normally process blocks from the forkdb on startup #572

heifner · 2024-08-17T19:30:01Z

In Spring 1.0.0, unlike Leap, we process blocks into the fork database immediately. This can cause the fork database to grow very large when syncing and shutdown due to hitting the new max-reversible-blocks limit (before #545). When a node is shutdown, in Leap, it was assumed that when restarting a node if that node did not receive any blocks from the network it would be at the same height as when it was shutdown. The existing tests/nodeos_read_terminate_at_block_test.py integration test verifies this behavior.

However, if the node shutdown because of max-reversible-blocks you would like on restart for the node to process blocks out of the fork database potentially allowing LIB to progress and reversible blocks to be consumed and shrink in size. This is at odds with the expected behavior of a node starting up after a terminate-at-block. A user might find it very odd to terminate a node at a block, but find on restart that the node is actually beyond that block.

To fix the issue reported in #565, we would like to process blocks out of the fork database on restart. However, we want to maintain the existing expectations of starting a node after a shutdown via terminate-at-block being at the same block height on restart.

Therefore, this PR modifies nodeos to normally process blocks out of the fork database on startup unless the node has no configured peers. The idea being that if a user has terminated a node with terminate-at-block and wants the node to remain at that block, they will restart the node without any p2p-peer-address configured. This is a bit of a hack until #570 can be added. Since #570 is a new feature, it will not come until a future release and will not be part of Spring 1.0.0.

Resolves #565

…a node that was terminated with many blocks in the forkdb

… attempt to request the next range of block instead of ignoring the request.

…s from the fork database. This allows a block stuck in a condition where it has too many blocks in the forkdb to process new blocks to attempt to apply those blocks on startup.

heifner · 2024-08-17T19:30:49Z

libraries/chain/controller.cpp

+            // terminate-at-block, the current expectation is that the node can be restarted to examine the state at
+            // which it was shutdown. For now, we will only process these blocks if there are peers configured. This
+            // is a bit of a hack for Spring 1.0.0 until we can add a proper pause-at-block (issue #570) which could
+            // be used to explicitly request a node to not process beyond a specified block.


This compromise came after a long conversation with @arhag on potential alternatives.

heifner · 2024-08-17T19:31:41Z

plugins/net_plugin/net_plugin.cpp

@@ -2224,6 +2224,8 @@ namespace eosio {
         set_state( lib_catchup );
         sync_last_requested_num = 0;
         sync_next_expected_num = chain_info.lib_num + 1;
+      } else if (sync_next_expected_num >= sync_last_requested_num) {
+         // break


This change causes the net_plugin to request more blocks if possible instead of remaining in a mode where it thinks it is already syncing.

heifner · 2024-08-17T19:34:23Z

tests/nodeos_read_terminate_at_block_test.py

+    if success:
+        for nodeId, nodeArgs in {**regularNodeosArgs, **replayNodeosArgs}.items():
+            assert cluster.getNode(nodeId).relaunch(), f"Unable to relaunch {nodeId}"
+            assert cluster.getNode(nodeId).waitForLibToAdvance(), f"LIB did not advance for {nodeId}"


This extra section on restarting nodes failed before the fixes in this PR.

heifner · 2024-08-17T19:56:06Z

tests/nodeos_read_terminate_at_block_test.py

+        1 : "--read-mode irreversible --terminate-at-block 100",
+        2 : "--read-mode head --terminate-at-block 125",
+        3 : "--read-mode speculative --terminate-at-block 150",
+        4 : "--read-mode irreversible --terminate-at-block 180"


These block values don't really matter, I could revert to old values if desired. They were changed when I was doing some initial testing to try and reproduce the issue in this test. The changes below are what is needed.

greg7mdp

this PR modifies nodeos to normally process blocks out of the fork database on startup unless the node has no configured peers

Only in irreversible mode, right?

heifner · 2024-08-19T10:31:46Z

Only in irreversible mode, right?

Also non-irreversible mode:

https://github.com/AntelopeIO/spring/pull/572/files#diff-42e9f97eed543dd784b1b77a536c50ff8e2403d5bbd3b1fb7f5dd90201391061R1988

ericpassmore · 2024-08-19T12:52:21Z

Note:start
group: STABILITY
category: INTERNALS
summary: Until pause at block height is available this fix enables use of forkdb to advance LIB on startup when connected to peers, but does not read from forkdb on startup when no peers are connected.
Note:end

heifner added 3 commits August 17, 2024 08:24

GH-565 Add test that trips issue of 565, sync start after restart of …

5a6f76c

…a node that was terminated with many blocks in the forkdb

GH-565 if on start sync the current sync has recieved all its blocks,…

1eb8318

… attempt to request the next range of block instead of ignoring the request.

GH-565 On startup if no peers configured, then attempt to apply block…

61ebd0f

…s from the fork database. This allows a block stuck in a condition where it has too many blocks in the forkdb to process new blocks to attempt to apply those blocks on startup.

heifner commented Aug 17, 2024

View reviewed changes

heifner requested review from linh2931 and greg7mdp August 17, 2024 19:34

heifner added the OCI Work exclusive to OCI team label Aug 17, 2024

heifner linked an issue Aug 17, 2024 that may be closed by this pull request

P2P: Sync mode stuck after hitting max-reversible-blocks, after restart #565

Closed

GH-565 Always log state of fork database on startup.

a48dc06

heifner commented Aug 17, 2024

View reviewed changes

Base automatically changed from GH-528-limit-sync to release/1.0 August 18, 2024 02:34

Merge branch 'release/1.0' into GH-565-sync-stuck

03589b9

greg7mdp approved these changes Aug 18, 2024

View reviewed changes

heifner requested a review from spoonincode August 19, 2024 14:34

arhag approved these changes Aug 19, 2024

View reviewed changes

heifner merged commit 792e19d into release/1.0 Aug 19, 2024
36 checks passed

heifner deleted the GH-565-sync-stuck branch August 19, 2024 21:49

This was referenced Aug 19, 2024

[1.0 -> main] Normally process blocks from the forkdb on startup #591

Merged

Node requesting blocks way ahead of forkdb head #635

Closed

[1.0] P2P: Prevent node from syncing too far ahead #638

Merged

heifner mentioned this pull request Aug 27, 2024

[1.0 -> main] P2P: Prevent node from syncing too far ahead #654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.0] Normally process blocks from the forkdb on startup #572

[1.0] Normally process blocks from the forkdb on startup #572

heifner commented Aug 17, 2024

heifner Aug 17, 2024

heifner Aug 17, 2024

heifner Aug 17, 2024

heifner Aug 17, 2024

greg7mdp left a comment

heifner commented Aug 19, 2024

ericpassmore commented Aug 19, 2024

[1.0] Normally process blocks from the forkdb on startup #572

[1.0] Normally process blocks from the forkdb on startup #572

Conversation

heifner commented Aug 17, 2024

heifner Aug 17, 2024

Choose a reason for hiding this comment

heifner Aug 17, 2024

Choose a reason for hiding this comment

heifner Aug 17, 2024

Choose a reason for hiding this comment

heifner Aug 17, 2024

Choose a reason for hiding this comment

greg7mdp left a comment

Choose a reason for hiding this comment

heifner commented Aug 19, 2024

ericpassmore commented Aug 19, 2024