Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Westmint stalled after westend update to 0.9.8 #532

Open
lovelaced opened this issue Jul 12, 2021 · 6 comments
Open

Westmint stalled after westend update to 0.9.8 #532

lovelaced opened this issue Jul 12, 2021 · 6 comments

Comments

@lovelaced
Copy link
Contributor

These are the logs from the collator at the time of restarting various nodes in the network.

https://drive.google.com/file/d/1MgAjy2LxAA6w7PsGZz6dkryyhL9VlPk2/view?usp=sharing

@bkchr
Copy link
Member

bkchr commented Jul 23, 2021

I assume that this error here is related:

We are seeing storage root mismatch errors on our test parachain. The issue is that the blocks are accepted by the relay chain, but only the block author can import the block. The other collators are failing with panicked at 'Storage root must match that calculated.'.

The problematic block doesn't contain any unusual extrinsic except our block reward payout which happens on every block.

We are running everything with version 0.9.8.

Here are the logs. Collator 5 authored the block and collator 1 was unable to import it.

@weichweich I copied your stuff here.

@ntn-x2
Copy link

ntn-x2 commented Jul 23, 2021

In our case, looking at the log more in depth it looks that, for the storage key 26aa394eea5630e07c48ae0c9558cef7b99d880ec681799c0cf30e8886371da973981a896fa26c597a76ff803b2a6e1ce8ed0c2a40fb5a0bbb24c38f5c8cd83d79498ac029ac9f87497677f5701e3d2c:

  • at time 2021-07-08T15:10:13.033 collator-1 reads A twice and then writes B
  • at time 2021-07-08T15:10:12.574 collator-2 reads B twice (the same that collator-1 wrote) and then writes C

Both A, B, and C are almost the same. They have the following structure:

0100000000000000020000000100000000000000<VALUE>332b0200000000000000000000000000000000000000000000000000b89d0d6955a001000000000000000000b89d0d6955a00100000000000000

where <VALUE> is:

  • for A -> e78d7b42827c
  • for B -> 7970079957e
  • for C -> e7a085afa880

Not sure if it helps in anyway, but these are our findings.

@ntn-x2
Copy link

ntn-x2 commented Jul 23, 2021

Small update: after re-hashing all pallet and storage names, we found out that, in our case, the issue is for the System pallet and its Account storage.

@wischli
Copy link
Contributor

wischli commented Jul 26, 2021

Small update: after re-hashing all pallet and storage names, we found out that, in our case, the issue is for the System pallet and its Account storage.

To add some more context: In each block, we reward the block author via note_author. After triple checking, I still don't see any non-determinism in our reward function.

Also, I checked the decoded AccountInfo from the storage mismatches and concluded that for the balance

  • B - A = reward
  • C - B = reward

Even though both collators read the same storage key in the same block. The mismatch happens in the free balance due to the reward.

@h4x3rotab
Copy link

Probably the same problem: #573

@weichweich
Copy link

weichweich commented Aug 19, 2021

We further investigated the log and it looks like the balance is not properly stored. It's written to storage but the next time it's read, the old value is returned.

When the storage error happens all other collators (expect the author) have an outdated balance in their store. Looking at the log, the last time a reward was paid for 4smcAoiTiCLaNrGhrAM4wZvt5cMKEGm8f3Cu9aFrpsh5EiNV (the block author), their balance was put into storage, but the next time they produces a block, the old balance is returned. No "writes" to their account happened between their two blocks (Block 172042 & 172054).
The storage key is: System_Account_53777d8555707da482713be2b7bb190fd6b97b1a02bcd80972b221b818c2e64ac507519af46efb8c6d4a90e151b2056f (I replace part of the key with their plain representation in the logs).

root-mismatch.log

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants