Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E23 commitment #4947

Merged
merged 3 commits into from
Sep 26, 2022
Merged

E23 commitment #4947

merged 3 commits into from
Sep 26, 2022

Conversation

awskii
Copy link
Member

@awskii awskii commented Aug 5, 2022

added commitment to aggregation run in erigon22
fixed allSnapshot database reading in erigon2
fixed genesis initialization

Currently commitment doesn't provides correct hashes after 3 blocks due to reading from state or history

@awskii awskii changed the title WIP E22 commitment [WIP] E22 commitment Aug 5, 2022
@awskii awskii marked this pull request as draft August 5, 2022 15:30
@awskii awskii force-pushed the e22-commitment branch 3 times, most recently from 4996b52 to 21e77ef Compare August 12, 2022 12:16
@awskii
Copy link
Member Author

awskii commented Aug 16, 2022

Get thruough bunch of bugs with EF merging and updating ReadWrapper23 aggregator context, currently stumbling on issue with commitment evaluation after merge - probably some issue with merging commitment domain (since root mismatch happens after agggregator merges only, and if I increase amount of transactions before merge - root mismatch happens latter. Investigating on it.

For EF merging I added min heap which ensures that merged ef will contain unique elements from both EF.

@AskAlexSharov
Copy link
Collaborator

what is EF?

@awskii
Copy link
Member Author

awskii commented Aug 17, 2022

@AskAlexSharov Elias-Fano encoded offsets
Today I finally localized issue. Commitment root hash mismatch occurred after domains merge due to reading obsolete data from Domain. I traced all writes to state and detected that for specific address (which has been touched before merge) Domain.Get() returns account with nonce=1 while actual nonce is 5. Spent some time observing merge code - no issues there. Finally got to code of Domain.prune.

If i disable pruning, issue is gone: actual value with nonce=5 has been deleted during prune while nonce=1 is not removed. prune takes as arguments current aggregation step, txFrom, txTo. In my case step=25, account value with n=5 has step 25 and value with n=1 has step 22. I'm not sure about how pruning was designed to be, but probably if there are several values for that key with different invertedStep value, store value with largest invertedStep. Is that correct? Slightly brokes complexity of prune operation since we could not aggregate and decide on deletion for one full iteration.

@awskii awskii changed the title [WIP] E22 commitment [WIP] E23 commitment Aug 19, 2022
@awskii
Copy link
Member Author

awskii commented Aug 19, 2022

branch cleaned and rebased up to current devel branch

@awskii
Copy link
Member Author

awskii commented Aug 23, 2022

Current state is that:

  • Goerli commitment and merge processed correctly, but after several merges get panic while Domain Index (elias-fano) is accessed. might be some issues during merge.
  • mainnet genesis block rootHash mismatch

Example of crash log:

INFO[08-20|10:24:28.643] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
INFO[08-20|10:24:58.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
INFO[08-20|10:25:28.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=3.6GB sys=22.9GB
CRIT[08-20|10:25:36.626] [index] calculating                      file=accounts.4-6.efi
CRIT[08-20|10:25:39.531] [index] write                            file=accounts.4-6.efi
INFO[08-20|10:25:56.458] [merge] Compressed                       millions=10
INFO[08-20|10:25:58.643] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=5.2GB sys=22.9GB
CRIT[08-20|10:26:17.458] [index] calculating                      file=accounts.4-6.vi
INFO[08-20|10:26:28.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=5.3GB sys=22.9GB
INFO[08-20|10:26:58.644] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=6.1GB sys=22.9GB
CRIT[08-20|10:27:25.186] [index] write                            file=accounts.4-6.vi
CRIT[08-20|10:27:27.119] [index] calculating                      file=accounts.4-6.kvi
INFO[08-20|10:27:28.652] Progress                                 block=4521174 blk/s=0.000 state files=0 total dat=0B total idx=0B hit ratio=0.000 hits+misses=0 alloc=7.0GB sys=22.9GB
CRIT[08-20|10:27:30.222] [index] write                            file=accounts.4-6.kvi
findMergeRange(18750000, 100000000)={accounts:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} storage:{valuesStartTxNum:0 valuesEndTxNum:0 values:false hi
storyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} code:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} comm
itment:{valuesStartTxNum:0 valuesEndTxNum:0 values:false historyStartTxNum:0 historyEndTxNum:0 history:false indexStartTxNum:0 indexEndTxNum:0 index:false} logAddrsStartTxNum:0 logAddrsEndTxNum:0 logAddrs:false logTopicsStartTxNum:0 logTopicsEndTxNum:0 lo
gTopics:false tracesFromStartTxNum:0 tracesFromEndTxNum:0 tracesFrom:false tracesToStartTxNum:0 tracesToEndTxNum:0 tracesTo:false}
unexpected fault address 0x79635d657e0b
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x79635d657e0b pc=0xa5e436]

goroutine 1 [running, locked to thread]:
runtime.throw({0x15729a5?, 0xa25228568b845ee7?})
        runtime/panic.go:992 +0x71 fp=0xc06a1a56d8 sp=0xc06a1a56a8 pc=0x45b911
runtime.sigpanic()
        runtime/signal_unix.go:825 +0x305 fp=0xc06a1a5728 sp=0xc06a1a56d8 pc=0x471cc5
github.com/ledgerwatch/erigon-lib/recsplit/eliasfano16.(*DoubleEliasFano).get2(0x0?, 0x4?)
        github.com/ledgerwatch/[email protected]/recsplit/eliasfano16/elias_fano.go:447 +0x56 fp=0xc06a1a57b8 sp=0xc06a1a5728 pc=0xa5e436
github.com/ledgerwatch/erigon-lib/recsplit/eliasfano16.(*DoubleEliasFano).Get3(0xc023b46240, 0x5f7)
        github.com/ledgerwatch/[email protected]/recsplit/eliasfano16/elias_fano.go:512 +0x27 fp=0xc06a1a57d8 sp=0xc06a1a57b8 pc=0xa5e9e7
github.com/ledgerwatch/erigon-lib/recsplit.(*Index).Lookup(0xc023b461c0, 0xc0e20607c0?, 0x2f3458eade8d3e0d)
        github.com/ledgerwatch/[email protected]/recsplit/index.go:196 +0xa5 fp=0xc06a1a5878 sp=0xc06a1a57d8 pc=0xa628e5
github.com/ledgerwatch/erigon-lib/recsplit.(*IndexReader).Lookup(0xc066650270, {0xc0e20607c0?, 0x10?, 0x25139e0?})
        github.com/ledgerwatch/[email protected]/recsplit/index_reader.go:61 +0x45 fp=0xc06a1a58a8 sp=0xc06a1a5878 pc=0xa63ae5
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).readFromFiles.func1(0xc0564f6820)
        github.com/ledgerwatch/[email protected]/state/domain.go:853 +0x65 fp=0xc06a1a5908 sp=0xc06a1a58a8 pc=0xa82365
github.com/google/btree.(*node[...]).iterate(0xc056fd1980, 0xffffffffffffffff, {0x0, 0xe0?}, {0x0?, 0xe0?}, 0x0?, 0x0, 0xc06a1a5a18)
        github.com/google/[email protected]/btree_generic.go:555 +0x66a fp=0xc06a1a5988 sp=0xc06a1a5908 pc=0x968c6a
github.com/google/btree.(*BTreeG[...]).Descend(0x1b77de0?, 0xc062a221c0?)
        github.com/google/[email protected]/btree_generic.go:815 +0x45 fp=0xc06a1a59e0 sp=0xc06a1a5988 pc=0x9698a5
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).readFromFiles(0x203014?, {0xc0e20607c0?, 0xc0500e1c00?, 0xc0e2060734?})
        github.com/ledgerwatch/[email protected]/state/domain.go:849 +0x8a fp=0xc06a1a5a58 sp=0xc06a1a59e0 pc=0xa822aa
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).get(0xc047eaa480, {0xc0e20607c0, 0x34, 0x34}, {0x1b91d20, 0xc000472060})
        github.com/ledgerwatch/[email protected]/state/domain.go:225 +0x350 fp=0xc06a1a5b18 sp=0xc06a1a5a58 pc=0xa7bd90
github.com/ledgerwatch/erigon-lib/state.(*DomainContext).Get(0x500e1c00?, {0xc06a1a5bbc?, 0x14, 0xadff9c?}, {0xc06a1a5bd0, 0x20, 0xc06a1a5be0?}, {0x1b91d20, 0xc000472060})
        github.com/ledgerwatch/[email protected]/state/domain.go:242 +0xcf fp=0xc06a1a5b70 sp=0xc06a1a5b18 pc=0xa7c06f
github.com/ledgerwatch/erigon-lib/state.(*AggregatorContext).ReadAccountStorage(...)
        github.com/ledgerwatch/[email protected]/state/aggregator.go:676
github.com/ledgerwatch/erigon/cmd/state/commands.(*ReaderWrapper23).ReadAccountStorage(0x449518f8f40bf996?, {0x54, 0xbf, 0x39, 0xed, 0x7d, 0xf, 0x44, 0x86, 0xf3, ...}, ...)
        github.com/ledgerwatch/erigon/cmd/state/commands/erigon23.go:466 +0x85 fp=0xc06a1a5c00 sp=0xc06a1a5b70 pc=0x121a445
github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState(0xc000246f20, 0xc0001d8a70, 0xc0da6d1240)
        github.com/ledgerwatch/erigon/core/state/state_object.go:186 +0xf2 fp=0xc06a1a5c98 sp=0xc06a1a5c00 pc=0xaed932
github.com/ledgerwatch/erigon/core/state.(*stateObject).GetState(0xc000246f20, 0xc0001d8a70, 0xc0da6d1240)
        github.com/ledgerwatch/erigon/core/state/state_object.go:163 +0xaf fp=0xc06a1a5d00 sp=0xc06a1a5c98 pc=0xaed7cf
github.com/ledgerwatch/erigon/core/state.(*IntraBlockState).GetState(0x7ec0219dfb6c68f9?, {0x54, 0xbf, 0x39, 0xed, 0x7d, 0xf, 0x44, 0x86, 0xf3, ...}, ...)
        github.com/ledgerwatch/erigon/core/state/intra_block_state.go:306 +0x53 fp=0xc06a1a5d38 sp=0xc06a1a5d00 pc=0xadc073
github.com/ledgerwatch/erigon/core/vm.opSload(0xc060d5aec0?, 0xc0ae698f30, 0x20?)
        github.com/ledgerwatch/erigon/core/vm/instructions.go:559 +0x187 fp=0xc06a1a5e00 sp=0xc06a1a5d38 pc=0xd03387
github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run(0xc0ae698f30, 0xc05efc2820, {0xc047e8cc30, 0xe4, 0xe4}, 0x0)

@awskii
Copy link
Member Author

awskii commented Aug 29, 2022

Currently, both mainnet and goerli commitment works, but merge issue mentioned above still happens. Depending on aggregation step, requires several merges before crash. For aggstep=10k took 17 merges at Goerli, for 100k - block=1857594 and still running. Didn't met issue on mainnet yet.

@awskii
Copy link
Member Author

awskii commented Sep 3, 2022

Added ability to restart after successful merge. I decided do not leave not-merged data in db, better to merge everything at time and be sure that db\hist are coherent.

Fixed elias-fano panic (as far as I could see during testing).

@awskii awskii changed the title [WIP] E23 commitment E23 commitment Sep 7, 2022
@awskii awskii marked this pull request as ready for review September 7, 2022 17:26
@awskii
Copy link
Member Author

awskii commented Sep 7, 2022

Fixed pruning db by verifying that pruned step is not the latest step in db before delete.
Probably, brought another problem that there will be abandoned steps in db.
With that, when database is ahead of written files for one step block processing works.

Added to commitment two keys - latesttx and roothash{txNum}. Both are inserted when ComputeCommitment is called. latesttx used to seek latest committed tx number during aggregator restart, roothash stores encoded state of HexPatriciaHash right after commitment evaluation.

For commitment used both approaches - read directly by keys from state and accumulate state updates before evaluation. Now they are both enabled and checks that both methods produces similar hashes.

fixed mainnet genesis roothash, get back with update processing

erigon23 replay after restart, index lookup fix

bumped erigon-lib
@awskii
Copy link
Member Author

awskii commented Sep 23, 2022

could solve merge conflict only after merge of ledgerwatch/erigon-lib#647

@AskAlexSharov
Copy link
Collaborator

ledgerwatch/erigon-lib#647

it's ok to refer to non-merged erigon-lib branch from this PR.
We usually do this way - because it allow merge erigon-lib's PR only if Erigon's CI is green.

@awskii
Copy link
Member Author

awskii commented Sep 26, 2022

ledgerwatch/erigon-lib#647

it's ok to refer to non-merged erigon-lib branch from this PR. We usually do this way - because it allow merge erigon-lib's PR only if Erigon's CI is green.

Yes but it doesn't work for current situation - i'm trying to keep up erigon-lib commitment branch with main, but it takes time to verify build after rebasing, so branch inevitably not as fresh as trunk.

@AlexeyAkhunov AlexeyAkhunov merged commit 82d0dcf into devel Sep 26, 2022
@AlexeyAkhunov AlexeyAkhunov deleted the e22-commitment branch September 26, 2022 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants