-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backward syncing issues when restarting Nimbus #5411
Comments
Some details: If Nimbus is restarted while Besu is already in the process of syncing, Besu goes into backward sync, which never completes and returns negative values for percentage complete while Nimbus reports receiving blocks from an unviable fork. Stopping both clients, starting Besu, and allowing it to reach the point where it is waiting for a connection from the CL prior to starting Nimbus allows sync to resume normally. Here's what Besu returns when it is stopped in the middle of the failed backward sync:
and here is an example of the unviable fork messages that repeat in Nimbus:
|
Generally, this is because of incorrect-
No descendent of an
Whether that's the actual cause, or it's something else, And, the fallout of that is that it just stalls Nimbus syncing, again, correctly, per spec. |
Updated the description with some targeted debug logging from some test runs. |
Examples of incorrect- 1:
Besu:
Block is valid: https://sepolia.beaconcha.in/slot/2235058 2:
Besu:
All of these supposedly 3:
Which are https://sepolia.beaconcha.in/slot/2235032 https://sepolia.beaconcha.in/slot/2235033 https://sepolia.beaconcha.in/slot/2235034 https://sepolia.beaconcha.in/slot/2235035 https://sepolia.beaconcha.in/slot/2235037 https://sepolia.beaconcha.in/slot/2235038 https://sepolia.beaconcha.in/slot/2235039 https://sepolia.beaconcha.in/slot/2235040 https://sepolia.beaconcha.in/slot/2235041 https://sepolia.beaconcha.in/slot/2235042 https://sepolia.beaconcha.in/slot/2235043 https://sepolia.beaconcha.in/slot/2235044 https://sepolia.beaconcha.in/slot/2235045 https://sepolia.beaconcha.in/slot/2235046 https://sepolia.beaconcha.in/slot/2235047 https://sepolia.beaconcha.in/slot/2235048 https://sepolia.beaconcha.in/slot/2235049 https://sepolia.beaconcha.in/slot/2235050 https://sepolia.beaconcha.in/slot/2235051 https://sepolia.beaconcha.in/slot/2235052 https://sepolia.beaconcha.in/slot/2235053 https://sepolia.beaconcha.in/slot/2235054 https://sepolia.beaconcha.in/slot/2235055 https://sepolia.beaconcha.in/slot/2235056 all of which are are valid, despite Besu claiming otherwise in these logs. 4:
which are https://sepolia.beaconcha.in/slot/2235059 https://sepolia.beaconcha.in/slot/2235060 https://sepolia.beaconcha.in/slot/2235061 also all valid. 5:
which is valid: https://sepolia.beaconcha.in/slot/2235090 6:
which corresponds to https://sepolia.beaconcha.in/block/3368146 In all of these cases, Besu is still effectively closing off the rest of the actually-valid/finalized chain from Nimbus via https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#transitioning-from-valid---invalidated-or-invalidated---valid and claiming that Nimbus forgets these (not actually...) "unviable" forks every restart, as it does not persist this particular cache, so restarting Nimbus appears to help briefly, but soon enough, these logs show Besu finding some other block to falsely label as |
Neither is a spec-required behavior according to https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_newpayloadv2 and https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_forkchoiceupdatedv2
Separating replies somewhat for easier readability between the enumerating-of-examples one and this: when Nimbus is told, repeatedly, that the (CL) blocks the rest of the Sepolia chain is claiming are correct aren't by Besu by way of |
Looking through the v23.3.2-based logs:
https://sepolia.beaconcha.in/block/0xb499a29588fe47f551212a1470c2c93440c81bdaaedd7f5cf075f70e8230e7c1 is block and https://sepolia.beaconcha.in/tx/0xd0bc8ce38682992a4989135e805e6c7dad1a853e996517d63f0a00dd058724a5 is the transaction identified. Apparently the nonce was not in fact below the sender account nonce after all. See usual remarks about what happens with
which are https://sepolia.beaconcha.in/block/0x468ae2859df900ef9827b76f0b4ed7c163d055d6d41d44288fe21dd7fb238514, https://sepolia.beaconcha.in/block/0x3f9ad73497e9c8e65377ac2401009349ae8765e6699e21aec86573c3a6ec418b (tx is https://sepolia.beaconcha.in/tx/0x8a909a3ab8c35f1322c1954e47e3d06b1ed4ec6379c138ff8db4067c3af54422), and https://sepolia.beaconcha.in/block/3429549, all of which are evidently
is allowed by the engine API but kind of strange -- https://sepolia.beaconcha.in/block/3429434 had been by For
In actuality valid) blocks involved are https://sepolia.beaconcha.in/block is /0xb499a29588fe47f551212a1470c2c93440c81bdaaedd7f5cf075f70e8230e7c1 and transaction is https://sepolia.beaconcha.in/tx/0xd0bc8ce38682992a4989135e805e6c7dad1a853e996517d63f0a00dd058724a5 Nimbus has some more interesting/informative logs here this time about how it responded (apparently via timestamp to this batch of spurious-
Nimbus kept trying to find a valid fork choice head, and to do so, it checked with the EL each time, but each time, Besu told it the block it asked about was invalid, so it simply didn't update the head, and had no viable way to proceed. Also as a result, this it lost peers: Besu effectively told it its (CL/libp2p) peers were giving it garbage (
with 2 peers to
with 0 peers, which hinders recovery. This is not required by spec, but it's a reasonable heuristic -- in general, staying connected to a CL peer providing
Block is https://sepolia.beaconcha.in/block/3429536 and, slightly interesting specifically because it notes this is why that backwards sync failed.
https://sepolia.beaconcha.in/block/0x3f9ad73497e9c8e65377ac2401009349ae8765e6699e21aec86573c3a6ec418b as usual is block and https://sepolia.beaconcha.in/tx/0x8a909a3ab8c35f1322c1954e47e3d06b1ed4ec6379c138ff8db4067c3af54422 is transaction.
Spurious-invalid block here is https://sepolia.beaconcha.in/block/3429549 even though Besu marked its parent explicitly as |
Hi @tersec, I agree with your analysis of besu's INVALID responses. Besu marking blocks as INVALID due to For the tests linked in this issue, I can't explain the exact cause that led to INVALID, but after performing tests with my PR, I have not seen the "invalid transaction nonce" error and have only seen one instance of the "World State Root does not match" error so an improvement has been made with PR (and should be released this week). My PR focused on fixing the "reorg" warnings during backwards sync (Observation (3) above), which indicated Besu was moving the head by a large number of blocks. This bug was due to some errant block hash that we were holding onto in our state and trying to move the head to it. I have created an issue for the remaining Besu error: #5622 |
@tersec While testing my PR, I have been collecting engine api logs via a proxy and have some analysis to share. SummaryThe most important finding shows that nimbus sometimes sends batches of out of order newPayloads before sending a FCU. Next steps for the original Observations raised: Besu Backwards SyncFor clarity I'll start with defining "backwards sync" from a besu point of view... The backward sync session will attempt to fill the gap in the chain between the latest imported EL block and the hash it has just received from the CL. It fills this gap by querying other EL peers first for the headers "Backwards sync phase 1 of 2" (the actual backwards part, using the parent hash to traverse up the chain). Once Besu has received this first hash, Besu does not require intervention from the CL in order to fill this gap, however it may ignore subsequent newPayload/FCUs while it is syncing if it already has that block imported, or it has already started a backwards sync session due to it. Observation (1) - backwards sync triggered instead of lock-stepAs stated in the summary, we receive batches of newPayload (with 1 FCU per batch) in reverse order by block number.
We don't have the parent of block_10 in our chain so we start a backwards sync session upon receipt of that first newPayload. I had previously stated on Discord that other clients send besu a single hash and wait for it to sync; this appears to only be the case during initial sync so I was mistaken w.r.t. the scenario being discussed here: client restarts/outages/recovery rather than initial sync behaviour. In fact, when either EL or CL is restarted, other clients have a range of strategies that are some variant of "lock-step". Observation (2) - overlapping backwards sync sessionsSometimes, before besu completes its current batch of backwards sync, nimbus gives it a different hash to sync. I observe that nimbus first sends hash_n then after some time sends hash_n-1, then hash_n-2, etc. Since besu's backward sync algorithm is multithreaded, we end up with multiple threads of work with overlapping batches of blocks. I think this explains the odd and out of order "percentage completed" logs. For example, let's say besu is on block 5 out of 10, I've seen behaviour like this...
So nimbus isn't giving besu a chance to complete the backwards sync session before it requests a new one. Couple of ways besu would behave better in this scenario: (i) sending newPayloads in order may avoid the majority of these cases. (ii) If nimbus resends newPayload[blockHash_8] then besu will disregard since it can detect it has already started a backwards sync session to cover all blocks up to this hash. Not sure there's any merit to doing that from a nimbus POV. (iii) If nimbus checks which block besu is on by using eth_blockNumber/eth_getBlockByHash or similar then maybe it can send a more appropriate hash for besu to sync to. I know you'd mentioned safety issue with this before but if nimbus knows it has sent an FCU to besu then isn't it reasonable to wait until besu has at least reached that hash before sending another? eth_getLogsFinally, there's the adjacent issue of lots of eth_getLogs request, which appeared to coincide with the issues observed. Out of all the CL clients, nimbus sends us the most of these, I think because it rebuilds them from scratch every time either besu or nimbus is restarted (which is maybe a knock-on effect of besu being in "syncing" mode when it receives out of order newPayload). |
This is true for Nimbus, yeah. It's possible that https://eips.ethereum.org/EIPS/eip-4881 will change this at some point, but that's how it currently works. |
Something else we could try in besu is not triggering backwards sync when we receive a newPayload (only when receiving FCU) |
Hi @tersec what do you think about this ^? I understand the order of newPayloads isn't mandated in the spec, but it seems like a bug to me to send a batch and then the FCU for the oldest rather than the latest one of that batch. |
Is this issue causing what I experience here: |
@kiss81 IIUC your issue is with nimbus reporting corrupted deposit logs? I have not experienced that issue, so not sure it's related. The "backward syncing" referred to in this issue is about nimbus helping besu get its blocks back into sync after an outage. The deposit logs (eth_getLogs) are relevant because nimbus requests them every time either client is restarted, but I don't believe they are the cause of any backward syncing issues. Feel free to post more of your besu logs in a new issue and we can see if we can spot anything that might help. |
I tested this out in #5666 and
Correction, I think my first test was invalid because I have realised it actually wasn't properly in sync before I started the test. |
Reopening this since it is still an issue, at least on Sepolia. This is one solution that will solve it however: #5687 |
There's also been some more recent testing #5699 (comment)
|
status-im/nimbus-eth2#5248 should help here, in response to status-im/nimbus-eth2#5242 |
cc @ahamlat |
status-im/nimbus-eth2#5635 avoids some duplicate |
have we been able to reproduce on besu latest? |
I have not noticed this issue for some time - e.g. testnet nodes seem to recover OK after a restart/update, so suspect it's resolved. Can reopen if necessary |
While testing out an unrelated fix with the unstable branch of nimbus, I observed at least three separate backward sync issues on our sepolia validating node. Note, I've only seen this issue on Sepolia and it was probably triggered by nimbus not getting peers before falling > 64 blocks behind the beacon chain which switches it into "far wall" sync mode, rather than lock-step sync.
It's worth noting that when nimbus is restarted, it requests a lot of
eth_getLogs
until it refills its deposit cache, which may be having some impact. When I enabled DEBUG during the backward sync issues, the eth_getLogs calls were ongoing so it seems related.There's a thread discussing it with tersec from nimbus, starting about here https://discord.com/channels/595666850260713488/1093963476948496455/1101258732190572676
Observations
(1) Node offline during upgrade so I'd expect a small backwards sync (or lock-step sync depending on client). During this expected backwards sync though, we got lots of messages like this...
Need to confirm, but I believe teku will give us the latest hash they have once they are in sync and we get a consistent set of % complete logs for phase 2 of 2 with one completed message at the end. Something about the pattern of requests nimbus is sending is causing multiple completed logs.
(2) Besu is "in sync" as evidenced by being able to propose but then is frequently being switched to backwards sync mode presumably by being given a hash that we don't have a parent for.
(3) The reorg logs, which suggests it's being thrashed between forks potentially...
(2) and (3) both could be same root cause, or maybe all 3 are same cause.
relevant logs for (2) and (3)...
Full logs
besu-23.1.2.log
nimbus-ea8ef5-2.log
Nimbus versions differing behaviour
Update 06/05/2023
Further testing with targeted besu debug logging enabled, using the following log RPC:
Besu was version 23.4.0 for all three nimbus versions.
Ran the same test on the three different nimbus versions on non-validating sepolia node, the test was just to restart nimbus:
Logs attached here: https://we.tl/t-RZMmscEG1A
In all cases, there was less noise than on the validating node which suggests validating (with lots of validators) may exacerbate the problems.
The text was updated successfully, but these errors were encountered: