-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Christmas trie #20481
Christmas trie #20481
Conversation
Here's the cut-off when switching from #20465 to this PR, showing the times for Something is not right, though, this PR seem to have somehow broken the mem gc: |
daily update, after about 3 x |
After five days (around |
|
Mon08 (this PR) finished full sync at block 9188701 at 2019-12-31 03:05:02, after a total of Here are some comparison tables. On |
Added a commit to prevent loading tries when not needed. At the moment, before this commit, we load the (non-existing) storage trie for every Later on, these tries are all committed -- and although the non-modified storage trie commit exit early, this PR also spins up a goroutine for them, which is now skipped aswell. |
539071a
to
4ce3930
Compare
func (s *stateObject) updateTrie(db Database) Trie { | ||
// Make sure all dirty slots are finalized into the pending storage area | ||
s.finalise() | ||
|
||
if len(s.pendingStorage) == 0 { | ||
return s.trie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious question. What happens if pre-byzantium, one tx modifies the trie but a second one does not. Then pending will be empty I think, but s.trie not nil any more, since a previous already loaded it. Shouldn't we explicitly return nil here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it will enter the commit, and quickly find that the trie root is not dirty. Since we don't provide a leaf callback for storage tries, there will not be a lot of overhead on that operation
trie/committer.go
Outdated
return hash, nil | ||
} | ||
if db == nil { | ||
return nil, errors.New("no db provided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about this. Should we error or panic? If it's a comitter, should it allow nil db? Isn't that a programming error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also feels a bit weird to explicitly check this for every single trie node. Shoudln't this be checked way way above and here just assumed it's correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If db is nil
, it means that the trie.db
is nil. I can't say if that's ever done on purpose, e.g. in tests.
panic("commit did not hash down to a hashnode!") | ||
} | ||
return hn, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Myeah, I don't think there's much reason for this to exist. You can move the db check into commit
and I would not care for the root hash node check (I mean, that's what commit is meant to do, no point in sanity checking that it works).
if common.BytesToHash(newRoot) != rootHash{ | ||
panic(fmt.Sprintf("Committed root %x != roothash %x", newRoot, rootHash)) | ||
} | ||
t.root = newRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I don't think this level of paranoia is needed :D
Re: panics, perhaps we can run this PR as is on a benchmarker and nuke and merge afterwards. Then you'll have one last sanity check, but it should really be fine. |
Dedicated hasher/committer
The previous
hasher.go
, was a generic implementation which was used by bothCommit
andHash
. It does acouple of things,
cached
, which will replace the oldroot
,collapsed
-trie containing the hashing-preimages, which is incrementally used to create higher order hash values.store
, if commit is being performed, to send nodes to the database, and also to theonleaf
handler (for Commit)This means that for every
Hash
orCommit
, every processed node was duplicated twice, usingcopy
. When I split this in two, it was possibleto remove a lot of redundant allocs.
Commit
The
Commit
operation is basically read-only, with the exception of setting thedirty
-flag tofalse
.Commit
(but sometimes still do), we can skip rlp-encoding most of the times.node
for tracking purposes, but that can be estimated insteadof calculated strictly.
onleaf
anddb
has been split off into a separate thread, and instead this PR send those tasks via a channel.Hash
The
Hash
operation has been optimized mainly to decrease the number of allocations.Benchmark tests
The trie benchmark tests were not accurate. The trie benchmarks used
b.N
to set the number of leafs in the trie. However,the correlation between 'work performed' and 'leafs' is not linear, so this caused the benchmark results to be
incomparable. On a faster codebase, the test framework will increase the
N
factor, which will increase the workfactor by
N * f(N)
, and depending on whatever finalN
the go test framework decides to stop at, it might show aslower average (
N
/time
) than the slower codebase.I added new
Fixed
benchmarks, which benchmarks the times toCommit
/Hash
fixed-sized tries instead.Further improvements
There are a couple of further improvements that can be made:
each subpath.
This should be an improvement, but only for tries of suitable size (perhaps 1000 new leafs or so). Most likely (imo), is that the old attempt had too
much overhead on the storage trie hashes, since the storage tries are too small to gain any real benefit from the change.
It might be a good idea to add an
updatecounter
to the trie, and only use paralell hashing when the number of updates (estimated) is above a certain threshold.database inserter
goroutine. This PR spins up a goroutine for db insertion at every trie commit. Although it's fairly fast, it's likely not a speed boostfor the storage tries, for the same reason as above. So a better approach might be to have one dedicated (per trie, or per database) where we can send data from the committer
to the database+onleaf callbacks. A separate benefit from doing this, is that during
statedb.go
commit
, we don't have to wait for each individual trie to finish beforestarting to process the next one. Pipelining ftw!
I actually did an implementation of this, but it got kind of messy since my poc exposed the
node
and caused cyclic dependencies, so I've held it off so far, until I figure outhow to best structure it.
Real life tests
I'm doing a full-sync on
mon08/mon09
, it appears that htis PR is significantly faster thanmaster
.https://geth-bench.ethdevops.io/d/Jpk-Be5Wk/dual-geth?orgId=1&var-exp=mon08&var-master=mon09&var-percentile=50&from=1577128018418&to=now