Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: save iavl tree asynchronously to improve throughput #16173

Closed
yihuang opened this issue May 16, 2023 · 16 comments
Closed

[Feature]: save iavl tree asynchronously to improve throughput #16173

yihuang opened this issue May 16, 2023 · 16 comments

Comments

@yihuang
Copy link
Collaborator

yihuang commented May 16, 2023

Summary

In recent development of memiavl, we find that the "async commit" mode increase the block replaying throughput tremendously(40%), I think it'll have even bigger impact in normal iavl, because the db overhead is larger here.

The idea is in abci commit, we only need to calculate the root hash using the in memory nodes (working nodes) immediately, but the db saving part can run concurrently without blocking the consensus state machine. because of the nature of the blockchain architecture, we can lose a few blocks state changes, cometbft will just replay them on startup, as long as the atomicity is handled right.

One can also consider this as an upgrade of the previous no-db-sync proposal: #14966, they share similar side-effects, this proposal wants to run the whole SaveVersion at background.

Problem Definition

The IAVL tree commit contributes a lot to the total block execution time (TODO profiling numbers), we can achieve much higher throughput by commit the iavl tree to db in background, it's especially important when the node is catching up with the chain, or the chain want to push the TPS to the limit.

Proposal

  • In iavl store's Commit, return the WorkingHash instead as the CommitID, but do the SaveVersion at background.
  • In upgrade module, before the panic of the upgrade msg, call a new API to wait for the asynchronous commit to finish, so the upgrade can success after restart. (do similar thing in other places that need graceful restart if there's any)
  • make sure the iavl implementation is ok to have SaveVersion runs concurrently.
@ValarDragon
Copy link
Contributor

This SGTM! There me be a bit more complexity in replay, and knowing which block to start from on restart! (And tracking which blocks are fully committed -- this becomes more important / harder when we make the write size smaller)

(And we may need to guarantee that you've succesfully committed to a disk a block at least k heights back for TM. Should generally be fine to just store the state diffs quickly, and re-merkelize on the fly for resync though)

This feels like a great idea to me!

@yihuang
Copy link
Collaborator Author

yihuang commented May 16, 2023

There me be a bit more complexity in replay, and knowing which block to start from on restart! (And tracking which blocks are fully committed -- this becomes more important / harder when we make the write size smaller)

we can rely on the status of latest commit info of the rootmulti for the last fully committed block, which is always written after all the stores.

@yihuang
Copy link
Collaborator Author

yihuang commented May 16, 2023

The main concern for now is if the current IAVL implementation concurrency safe for this use case.

Thread-safty is not the concern, but the mutex is, it won't be able to achieve better throughput if the reading is blocked by the writing. I think we can make the changes in sdk first, @cool-develope can optimize the iavl later, memiavl can take full advantage of this.

@yihuang
Copy link
Collaborator Author

yihuang commented May 16, 2023

I feel this could be similar to what we'll do in the optimistic execution, where the commit is not "actual" commit.

yihuang added a commit to yihuang/cosmos-sdk that referenced this issue May 16, 2023
@yihuang yihuang mentioned this issue May 16, 2023
19 tasks
@yihuang yihuang changed the title [Feature]: save iavl tree asynchronously to improve throughput greatly [Feature]: save iavl tree asynchronously to improve throughput May 16, 2023
@alexanderbez
Copy link
Contributor

Love this idea @yihuang -- thanks for opening a PR too!

@yihuang
Copy link
Collaborator Author

yihuang commented May 17, 2023

Unforturntely it don't work with current IAVL I think, the main issue is the WorkingHash don't modify current version, so if the next Commit happens before the background SaveVersion, it'll use the old version.
We'll need some refactoring in iavl to take advantage of this. @cool-develope

@tac0turtle
Copy link
Member

while we cant do multiple iavl tree commits, we could do 1, where we return commit to comet right away and save asynchronous. This would help in a single case and would still have performance benefits

@yihuang
Copy link
Collaborator Author

yihuang commented May 17, 2023

while we cant do multiple iavl tree commits, we could do 1, where we return commit to comet right away and save asynchronous. This would help in a single case and would still have performance benefits

yeah, that is possible, something like this?

// both channels are unbuffered
commitChan := make(chan int64)
commitResultChan := make(chan error)

func Commit() {
  // make sure the last async commit finished
  err := <- commitResultChan
  // calculate root hash for the new commit
  commitID := store.WorkingCommitID()
  // submit new commit request
  commitChan <- commitID.Version
  return commitID
}

// async commit worker
go func() {
  for v := range commitChan {
    _, _, err := store.SaveVersion()
    commitResultChan <- err
  }
}()

@tac0turtle
Copy link
Member

yea that seems right, it would be interesting to see what sort of speed up that would give. I dont forsee this helping too much on sync but at the head of the chain it could help

@ValarDragon
Copy link
Contributor

I think on return to comet, we need to be sure we could replay to this block on our own. Which means we need to synchronously commit to the state diff, but not the merkelization

@ValarDragon
Copy link
Contributor

@tac0turtle are you imagining that during sync, your not committing to disk at all? (Or only committing at "wide" intervals, e.g. snapshots)

otherwise not sure I follow how this wouldn't help? Your letting there be multiple db txs in parallel for writes to disk, so committing is happening in parallel to CPU processing of nodes (whereas today you don't use CPU & commit at once)

@yihuang
Copy link
Collaborator Author

yihuang commented May 17, 2023

I think on return to comet, we need to be sure we could replay to this block on our own

I think we don't need to make sure that, as long as the cometbft can replay the missing blocks from block store on abci handshake.

not committing to disk at all? (Or only committing at "wide" intervals, e.g. snapshots

this won't be an archive node though, some node just need to provide full service at every historical versions, I guess.

@cool-develope
Copy link
Contributor

cool-develope commented May 17, 2023

Unforturntely it don't work with current IAVL I think, the main issue is the WorkingHash don't modify current version, so if the next Commit happens before the background SaveVersion, it'll use the old version. We'll need some refactoring in iavl to take advantage of this. @cool-develope

yeah, IMO we should separate the iavl into three parts, WorkingHash, Commit , SaveStorage
in Commit stage, just finalize the iavl version in memory and nodes will be stored background in SaveStorage.
it won't break any APIs of SDK

@tac0turtle
Copy link
Member

this is very similar to the work aditya did on iavl a few years ago which was reverted, we should look at that work to see what we can reuse. A good time to dive into that is after we get a decent portion of the way through store v2 rewrite

@ValarDragon
Copy link
Contributor

not sure that prior IAVL work is helpful, this all becomes much cleaner write-side with the new IAVL storage format. (Every version is independent writes!)

@tac0turtle
Copy link
Member

iavlv2 takes advantage of this as well. Ill close this as the vision is to adopt that version in the coming months

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants