Skip to content
This repository was archived by the owner on Jul 1, 2021. It is now read-only.

Consensus failure (full-sync, all chains) on contract create/destruct within same block #1478

Closed
veox opened this issue Jan 16, 2020 · 12 comments · Fixed by #1544
Closed

Consensus failure (full-sync, all chains) on contract create/destruct within same block #1478

veox opened this issue Jan 16, 2020 · 12 comments · Fixed by #1544

Comments

@veox
Copy link
Contributor

veox commented Jan 16, 2020

  • Trinity version: v0.1.0-alpha.34-79-gc386742f (commit c386742, recent master)
  • OS: Arch Linux
  • Python version (output of python --version): CPython 3.7
  • Environment (output of pip freeze): gist.

What is wrong?

Running:

trinity --goerli --sync-mode=full --disable-upnp --max-peers=1 --disable-discovery

... and adding node as peer from remote geth node.

Results in state root mismatch at block #226323 - see run log:

eth_utils.exceptions.ValidationError: Mismatch between block and imported block on 1 fields:
 - header.state_root:
    (actual)  : b"X.\x0e,\xc2xA\xed\xb4Y\x15w\x93\xa3\xc1\xe1./\x15\xbf\x11'\x12\xd6\xa8\x85q\xdf\xb2\xd9\x1a0"
    (expected): b'\x05\xb3<\xb4\xb3sr\xb6d\xc0\xa3\xdb|\x1a\xef\xf40Hh\xa9yg7"\xc8Ao\x99\x18UX\x08'

Note: Goerli starts from PetersburgVM, that's the VM at given block.

Extra:

How can it be fixed

This turned out to be a long-time regression in py-evm; see #1478 (comment) and ensuing.

🔭

@veox veox mentioned this issue Jan 16, 2020
2 tasks
@veox
Copy link
Contributor Author

veox commented Jan 16, 2020

In [2]: web3.Web3.toHex(b"X.\x0e,\xc2xA\xed\xb4Y\x15w\x93\xa3\xc1\xe1./\x15\xbf\x11'\x12\xd6\xa8\x85q\xdf\xb2\xd9\x1a0")                                       
Out[2]: '0x582e0e2cc27841edb459157793a3c1e12e2f15bf112712d6a88571dfb2d91a30'

In [3]: web3.Web3.toHex(b'\x05\xb3<\xb4\xb3sr\xb6d\xc0\xa3\xdb|\x1a\xef\xf40Hh\xa9yg7"\xc8Ao\x99\x18UX\x08')                                                   
Out[3]: '0x05b33cb4b37372b664c0a3db7c1aeff4304868a979673722c8416f9918555808'

Confirming that the actual state root in ValidationError above is the one in @cburgdorf's trace.

@veox
Copy link
Contributor Author

veox commented Jan 16, 2020

Neither Etherscan nor Trinity's trace show the contract's code after transaction 0 of the block; geth's trace shows the following was RETURNed:

6080604052348015600f57600080fd5b5060043610604e577c0100000000000000000000000000000000000000000000000000000000600035046341c0e1b581146053578063f1eae25c14605b575b600080fd5b60596061565b005b6059609d565b60005473ffffffffffffffffffffffffffffffffffffffff16331415609b5760005473ffffffffffffffffffffffffffffffffffffffff16ff5b565b6000805473ffffffffffffffffffffffffffffffffffffffff19163317905556fea165627a7a723058203230f97067f2d9f26cc43a9945289dd758190b45efa590bf0889d8f768e2fed40029

Raw disassembly in this gist.

The contract has only 2 4-byte function checks:

  • 0x41c0e1b5 - kill();
  • 0xf1eae25c mortal().

According to Etherscan, the 10 transactions in the block do the following:

  1. Create the contract.
  2. Call mortal().
  3. Call mortal().
  4. Call mortal().
  5. Call mortal().
  6. Call mortal().
  7. Call mortal().
  8. Call kill().
  9. Call mortal().
  10. Call kill().

Without going into too much detail: it seems at a glance that gas usages between trinity and geth match.

@veox
Copy link
Contributor Author

veox commented Jan 16, 2020

First ear-mark ("maybe later") is that in the trace (towards the very end), there are three occurrences of

Deleting all storage in account (...), hashed (...)

This comes from py-evm, once for each of the last three txs in the block.

This might be simply logging noise, but from the link one can see that the following is also hit on each invocation:

        self._journal_storage.clear()
        self._storage_cache.reset_cache()

OT: I'm also unsure of the utility of logging keccak(self._address).hex() in addition to self._address.hex()... Perhaps the latter is a typo?..


Then there's

Updating account (...) to storage root 0x6da071a8604d3de15dc4dc9b1fbcab39a3523094e94ca025a89ca9811dd72ffc

I'm having a bad memory day; is that what an empty deleted account's storage_root should look like?..

@veox
Copy link
Contributor Author

veox commented Jan 16, 2020

Just checked - am getting a similar error on mainnet block #116525. 🤦‍♂️ Except this time header.receipt_root is also incorrect.


Rolling back versions, v0.1.0-alpha.23 "Theodora" from March 1, 2019 managed to sync past. v0.1.0-alpha.25 "Betty"-fix from June 6, 2019 didn't ("mix hash mismatch").

That's a long time we haven't tried full-sync, apparently... Or there's something seriously wrong with my machine.

@carver
Copy link
Contributor

carver commented Jan 16, 2020

alpha.25 is unlikely to be the one that has the problem (it's a pretty minor patch). The offending bug is much more likely to be in alpha.24, since the set of changes is huge.

@veox
Copy link
Contributor Author

veox commented Jan 16, 2020

git bisect shows it's 7ee60c5 - the upgrade from py-evm==0.2.0a42 to py-evm==0.2.0a43:

7ee60c54f57d25fd1c86c828a2c342089e68b0d8 is the first bad commit
commit 7ee60c54f57d25fd1c86c828a2c342089e68b0d8
Author: Jason Carver <[email protected]>
Date:   Mon May 20 12:28:56 2019 -0700

    Upgrade to py-evm v0.2.0 alpha 43

 setup.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Here's the diff, release notes.

@veox
Copy link
Contributor Author

veox commented Jan 16, 2020

Cross-project bisect result, with trinity at 7ee60c5 (with minor setup.py modification so it can run), py-evm between v0.2.0a42 and v0.2.0a43:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
8e0f6e20fcd9726381a6bbd6875480482321e1ca
e75a22aad0bffd4a797246c2d16de96e9ff05c33
f283a86810e45a9bdccbec680f1c6cde1d1ed30d
ddefa2fbad378aad6562239eb087e8eb27794fc1
We cannot bisect more!

The first three are from ethereum/py-evm#1735 (commits), the last one is the branch-to-master merge commit itself.

@veox veox changed the title Likely consensus failure (detected on Goerli full-sync) Likely consensus failure (full-sync, Goerli/Mainnet) Jan 16, 2020
@carver
Copy link
Contributor

carver commented Jan 23, 2020

Except this time header.receipt_root is also incorrect.

This makes sense, since the receipts included the state root back then. It should even help track down which of the 83 transactions is the broken one.

In [10]: block                                                                                                                                                                                                                                
Out[10]: <FrontierBlock(#Block #116525)>

In [11]: chain.import_block(block)

...

ValidationError: Mismatch between block and imported block on 2 fields:
 - header.state_root  :
    (actual)  : b'c\xbf\x86\xfe\xd5\x13\x90O|P*\xea\xc2A\x9e\x8fA\xdd\x94arfoo\xf4hR\xccN\x94R\xb2'
    (expected): b'\x1d\xd6;\xfc\xf5\x82\x96\xf3P7\xe2vc8\x18\xdb\x1dC5B\xc1\xe4I\xb52\x15\x1ep\x94\x86\xac\xd0'
 - header.receipt_root:
    (actual)  : b'\xe6\xa2d\xb4"\x98\xaf\x99\x91o\xe4\x99z\x8fW\xc2\xf0\x0b\xd0\x9d0o\xc5s$\xd3\xc9\x93\xb34\xb0\x98'
    (expected): b'KE\x1c\xefpE\xa0\xcd\xbc\xb5\xc4\x93f\x82\x0b\xe4\x07x\xe9\x89\xc2\x1b\x0f\xdc\xb2\xea>\x83\x12h\xbf\xdd'

Next up (probably tomorrow) is to generate the proper receipts, to tell which txn is busted. If anyone already did this and knows which txn is broken, let me know! :)

@carver
Copy link
Contributor

carver commented Jan 25, 2020

Ah, forgot to post this last night. I found which transaction it is:

receipt 72 (0-indexed) should have been {'state_root': b'\xaa\x9d\x1c\xea\xd6\xba\xc1\x9a;\xe3\xa24\xe6\xa0\xf6\x95\xd4\xb8\xecFb\xa8Z\xbe\xa3\xf6c\xf5H\xe3\xe6#', 'gas_used': 2122041, 'bloom': 0, 'logs': ()} but was actually {'state_root': b'\xe5\x87L\xd2U\n\x9e\xdcHZ\xc8\xdd\xe2J\xa6\x85\xd4kLo\x9a\xa8\xc8p\x1e\xc1[&;N\t\xc4', 'gas_used': 2122041, 'bloom': 0, 'logs': ()}

So next step is to diff the VM steps in the good and bad VMs. Interesting that the gas used is the same!

@veox
Copy link
Contributor Author

veox commented Jan 29, 2020

I'm not amazed it's that transaction: it's a call with 0x41c0e1b5 (kill()).

I'll try to hack up a way to trace that tx... Do post if you know a trick.


Note that the destroyed contract was created in the same block, just one transaction earlier.


There are two SELFDESTRUCT calls in the block, the earlier one being at index 66. That particular contract was created in an earlier block.

If I understand correctly, this particular tx did not cause a state root mismatch; if true, then the issue we have is with contracts being created and destroyed within the same block.

@veox
Copy link
Contributor Author

veox commented Jan 29, 2020

Transaction traces for txs at index 71 and 72: in this gist.

As above, this is for trinity v0.1.0-alpha.23-689-g7ee60c54 (i.e. at commit 7ee60c5), with py-evm at v0.2.0a42 ("good") and v0.2.0a43 ("bad").


The VM steps in traces seem identical at a glance.

I've edited the timestamps so the files can be diffed. The diff is also in the gist. The VM steps themselves are identical.

So, as mentioned previously, this probably has something to do with AccountDB management itself, and perhaps the fact that the

Updating account (...) to storage root (...)

lines are printed twice.


Hmm, is the diff saying that we're deleting the SELFDESTRUCTed contract's storage, and then re-creating it as empty?..

EDIT: Yup! I've checked with that earlier tx at index 66, and its storage is deleted, and left that way!


TL;DR: seems that if a contract is created and deleted within the same block, then we're erroneously re-creating its storage as "empty".


Maybe this has something to do with "dirty accounts" and "dirty account stores"... This is a py-evm issue, though, so not sure if debugging should continue here I'll finally make an issue there.

@veox veox changed the title Likely consensus failure (full-sync, Goerli/Mainnet) Consensus failure (full-sync, all chains) on contract create/destruct within same block Jan 29, 2020
@veox
Copy link
Contributor Author

veox commented Feb 1, 2020

Confirming that the issue is no longer present when using py-evm from PR ethereum/py-evm#1913.

Main-net blocks #116525 and #336508 imported successfully (I ran it up to #347457).

So has Goerli block #226323 (still running, #472783 ATM).

carver added a commit to carver/trinity that referenced this issue Feb 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants