-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: save iavl tree asynchronously to improve throughput #16173
Comments
This SGTM! There me be a bit more complexity in replay, and knowing which block to start from on restart! (And tracking which blocks are fully committed -- this becomes more important / harder when we make the write size smaller) (And we may need to guarantee that you've succesfully committed to a disk a block at least k heights back for TM. Should generally be fine to just store the state diffs quickly, and re-merkelize on the fly for resync though) This feels like a great idea to me! |
we can rely on the status of latest commit info of the rootmulti for the last fully committed block, which is always written after all the stores. |
Thread-safty is not the concern, but the mutex is, it won't be able to achieve better throughput if the reading is blocked by the writing. I think we can make the changes in sdk first, @cool-develope can optimize the iavl later, memiavl can take full advantage of this. |
I feel this could be similar to what we'll do in the optimistic execution, where the commit is not "actual" commit. |
Closes: cosmos#16173
Love this idea @yihuang -- thanks for opening a PR too! |
Unforturntely it don't work with current IAVL I think, the main issue is the |
while we cant do multiple iavl tree commits, we could do 1, where we return commit to comet right away and save asynchronous. This would help in a single case and would still have performance benefits |
yeah, that is possible, something like this? // both channels are unbuffered
commitChan := make(chan int64)
commitResultChan := make(chan error)
func Commit() {
// make sure the last async commit finished
err := <- commitResultChan
// calculate root hash for the new commit
commitID := store.WorkingCommitID()
// submit new commit request
commitChan <- commitID.Version
return commitID
}
// async commit worker
go func() {
for v := range commitChan {
_, _, err := store.SaveVersion()
commitResultChan <- err
}
}() |
yea that seems right, it would be interesting to see what sort of speed up that would give. I dont forsee this helping too much on sync but at the head of the chain it could help |
I think on return to comet, we need to be sure we could replay to this block on our own. Which means we need to synchronously commit to the state diff, but not the merkelization |
@tac0turtle are you imagining that during sync, your not committing to disk at all? (Or only committing at "wide" intervals, e.g. snapshots) otherwise not sure I follow how this wouldn't help? Your letting there be multiple db txs in parallel for writes to disk, so committing is happening in parallel to CPU processing of nodes (whereas today you don't use CPU & commit at once) |
I think we don't need to make sure that, as long as the cometbft can replay the missing blocks from block store on abci handshake.
this won't be an archive node though, some node just need to provide full service at every historical versions, I guess. |
yeah, IMO we should separate the iavl into three parts, |
this is very similar to the work aditya did on iavl a few years ago which was reverted, we should look at that work to see what we can reuse. A good time to dive into that is after we get a decent portion of the way through store v2 rewrite |
not sure that prior IAVL work is helpful, this all becomes much cleaner write-side with the new IAVL storage format. (Every version is independent writes!) |
iavlv2 takes advantage of this as well. Ill close this as the vision is to adopt that version in the coming months |
Summary
In recent development of memiavl, we find that the "async commit" mode increase the block replaying throughput tremendously(40%), I think it'll have even bigger impact in normal iavl, because the db overhead is larger here.
The idea is in abci commit, we only need to calculate the root hash using the in memory nodes (working nodes) immediately, but the db saving part can run concurrently without blocking the consensus state machine. because of the nature of the blockchain architecture, we can lose a few blocks state changes, cometbft will just replay them on startup, as long as the atomicity is handled right.
One can also consider this as an upgrade of the previous no-db-sync proposal: #14966, they share similar side-effects, this proposal wants to run the whole
SaveVersion
at background.Problem Definition
The IAVL tree commit contributes a lot to the total block execution time (TODO profiling numbers), we can achieve much higher throughput by commit the iavl tree to db in background, it's especially important when the node is catching up with the chain, or the chain want to push the TPS to the limit.
Proposal
Commit
, return theWorkingHash
instead as theCommitID
, but do theSaveVersion
at background.SaveVersion
runs concurrently.The text was updated successfully, but these errors were encountered: