Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Parachain state root mismatch #573

Closed
h4x3rotab opened this issue Aug 14, 2021 · 2 comments
Closed

Parachain state root mismatch #573

h4x3rotab opened this issue Aug 14, 2021 · 2 comments

Comments

@h4x3rotab
Copy link

h4x3rotab commented Aug 14, 2021

We are running our parachain testnet on ChaChaCha. About once a few days, we see a large portion of the network nodes got stall with Storage root must match that calculated error while importing the same block, but there's one collator (collator1) still producing blocks and can grow the parachain.

This probably caused a network split. On telemetry, we used to see the only running collator still producing blocks with a txpool size of 0, while all the other nodes had thousands of pending extrinsics.

telegram-cloud-photo-size-5-6105144004237700228-y

We see all the nodes besides collator1 accepted 63240, but rejected 63241 with the error below:

2021-08-14 07:51:36 [Parachain] Starting collation. relay_parent=0xbfd74c0ba9452a51d109c1c7b640da305b7febbdc3e66bacd918bdc34e5fee91 at=0xd91398c618bd01a87d8b8f32a121014edead0075cd79b6da7c458fa8980001b1
2021-08-14 07:51:36 [Relaychain] 💤 Idle (34 peers), best: #315881 (0xbfd7…ee91), finalized #315877 (0x4cae…6476), ⬇ 16.2kiB/s ⬆ 11.1kiB/s
2021-08-14 07:51:36 [Parachain] panicked at 'Storage root must match that calculated.', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/74101dc/frame/executive/src/lib.rs:469:9
2021-08-14 07:51:36 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x4c6f - <unknown>!rust_begin_unwind\n    1: 0x1d57 - <unknown>!core::panicking::panic_fmt::h0dfe153eb0ef456a\n    2: 0x13f5 - <unknown>!core::panicking::panic::ha2f229d777a180c9\n    3: 0x10d34d - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hfd62dc069f40dabd\n    4: 0x151635 - <unknown>!Core_execute_block\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\n"))))
2021-08-14 07:51:36 [Parachain] 💔 Error importing block 0x6669d15cf5c0cfc912d75419501f191469b5828053ed3ca134f88c99383d8d26: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x4c6f - <unknown>!rust_begin_unwind\\n    1: 0x1d57 - <unknown>!core::panicking::panic_fmt::h0dfe153eb0ef456a\\n    2: 0x13f5 - <unknown>!core::panicking::panic::ha2f229d777a180c9\\n    3: 0x10d34d - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hfd62dc069f40dabd\\n    4: 0x151635 - <unknown>!Core_execute_block\\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\\n\")")))
2021-08-14 07:51:36 [Parachain] panicked at 'Storage root must match that calculated.', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/74101dc/frame/executive/src/lib.rs:469:9
2021-08-14 07:51:36 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x4c6f - <unknown>!rust_begin_unwind\n    1: 0x1d57 - <unknown>!core::panicking::panic_fmt::h0dfe153eb0ef456a\n    2: 0x13f5 - <unknown>!core::panicking::panic::ha2f229d777a180c9\n    3: 0x10d34d - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hfd62dc069f40dabd\n    4: 0x151635 - <unknown>!Core_execute_block\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\n"))))
2021-08-14 07:51:36 [Parachain] 💔 Error importing block 0x6669d15cf5c0cfc912d75419501f191469b5828053ed3ca134f88c99383d8d26: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x4c6f - <unknown>!rust_begin_unwind\\n    1: 0x1d57 - <unknown>!core::panicking::panic_fmt::h0dfe153eb0ef456a\\n    2: 0x13f5 - <unknown>!core::panicking::panic::ha2f229d777a180c9\\n    3: 0x10d34d - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hfd62dc069f40dabd\\n    4: 0x151635 - <unknown>!Core_execute_block\\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\\n\")")))
2021-08-14 07:51:37 [Parachain] 💤 Idle (13 peers), best: #63240 (0xd913…01b1), finalized #63238 (0x963b…e139), ⬇ 12.6kiB/s ⬆ 1.9kiB/s
2021-08-14 07:51:41 [Relaychain] 💤 Idle (34 peers), best: #315881 (0xbfd7…ee91), finalized #315878 (0x1774…bfc4), ⬇ 4.1kiB/s ⬆ 97.2kiB/s
2021-08-14 07:51:42 [Relaychain] ✨ Imported #315882 (0x3f27…76ef)
2021-08-14 07:51:42 [Parachain] 💤 Idle (13 peers), best: #63240 (0xd913…01b1), finalized #63238 (0x963b…e139), ⬇ 4.3kiB/s ⬆ 1.0kiB/s
2021-08-14 07:51:43 [Parachain] panicked at 'Storage root must match that calculated.', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/74101dc/frame/executive/src/lib.rs:469:9
2021-08-14 07:51:43 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x4c6f - <unknown>!rust_begin_unwind\n    1: 0x1d57 - <unknown>!core::panicking::panic_fmt::h0dfe153eb0ef456a\n    2: 0x13f5 - <unknown>!core::panicking::panic::ha2f229d777a180c9\n    3: 0x10d34d - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hfd62dc069f40dabd\n    4: 0x151635 - <unknown>!Core_execute_block\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\n"))))
2021-08-14 07:51:43 [Parachain] 💔 Error importing block 0x6669d15cf5c0cfc912d75419501f191469b5828053ed3ca134f88c99383d8d26: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x4c6f - <unknown>!rust_begin_unwind\\n    1: 0x1d57 - <unknown>!core::panicking::panic_fmt::h0dfe153eb0ef456a\\n    2: 0x13f5 - <unknown>!core::panicking::panic::ha2f229d777a180c9\\n    3: 0x10d34d - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hfd62dc069f40dabd\\n    4: 0x151635 - <unknown>!Core_execute_block\\nnote: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information\\n\")")))

I've checked block 63241. Everything looks normal (polkadotjs). The only extrinsic sync_offchain_message is to report a message to our pallet. It doesn't really do much except checking the signature and increment a storage map value by one (message id). 300+ similar messages had been successfully executed before that block, and we cannot tell any difference from it to the others.

Walkaround: We have deleted the parachain database of all the nodes except collator1, and then all the nodes got synced well.

(Could be a different problem:) There are still a few nodes untouched, but they got stuck with a different symptom. They were stuck earlier than 63240. I can see new blocks got imported, but best is always stuck at 63217:

2021-08-14 09:44:48 [Relaychain] ✨ Imported #317007 (0xeb76…143d)
2021-08-14 09:44:48 [Relaychain] ✨ Imported #317007 (0xb9e8…d87d)
2021-08-14 09:44:48 [Parachain] ✨ Imported #63559 (0x55f3…c5e1)
2021-08-14 09:44:49 [Parachain] ✨ Imported #63559 (0x5979…a8a4)
2021-08-14 09:44:51 [Parachain] 💤 Idle (11 peers), best: #63217 (0x9d3a…3516), finalized #15256 (0xa486…9c14), ⬇ 10.9kiB/s ⬆ 0.7kiB/s
2021-08-14 09:44:52 [Relaychain] 💤 Idle (20 peers), best: #317007 (0xeb76…143d), finalized #316992 (0xda5c…daf2), ⬇ 12.1kiB/s ⬆ 1.3kiB/s

For this node, ctrl-c doesn't kill the process. I have to send SIGKILL. After restarting this node, it shows a Reorg on #63217 and then it got synced to the latest block.

Other information

  • The parachain runtime and the collator node is based on v0.9.8; polkadotjs
  • The relay chain is ChaChaCha, based on v0.9.7; polkadotjs
  • All the nodes never lose peers
  • ChaChaCha has been always producing blocks correctly
  • Dumped logs for the stuck collators (collator2/3): link (Search "Storage root must match that calculated" to locate the problem)
  • Dumped logs for the stuck full node: link

Related issue

#532

@h4x3rotab
Copy link
Author

Future tests shows that if we close the stall nodes and restart it after a while, they can recover as well.

@h4x3rotab
Copy link
Author

Duplicated with #607

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant