[Feature]: save iavl tree asynchronously to improve throughput #16173

yihuang · 2023-05-16T07:33:00Z

Summary

In recent development of memiavl, we find that the "async commit" mode increase the block replaying throughput tremendously(40%), I think it'll have even bigger impact in normal iavl, because the db overhead is larger here.

The idea is in abci commit, we only need to calculate the root hash using the in memory nodes (working nodes) immediately, but the db saving part can run concurrently without blocking the consensus state machine. because of the nature of the blockchain architecture, we can lose a few blocks state changes, cometbft will just replay them on startup, as long as the atomicity is handled right.

One can also consider this as an upgrade of the previous no-db-sync proposal: #14966, they share similar side-effects, this proposal wants to run the whole SaveVersion at background.

Problem Definition

The IAVL tree commit contributes a lot to the total block execution time (TODO profiling numbers), we can achieve much higher throughput by commit the iavl tree to db in background, it's especially important when the node is catching up with the chain, or the chain want to push the TPS to the limit.

Proposal

In iavl store's Commit, return the WorkingHash instead as the CommitID, but do the SaveVersion at background.
In upgrade module, before the panic of the upgrade msg, call a new API to wait for the asynchronous commit to finish, so the upgrade can success after restart. (do similar thing in other places that need graceful restart if there's any)
make sure the iavl implementation is ok to have SaveVersion runs concurrently.

The text was updated successfully, but these errors were encountered:

ValarDragon · 2023-05-16T07:50:56Z

This SGTM! There me be a bit more complexity in replay, and knowing which block to start from on restart! (And tracking which blocks are fully committed -- this becomes more important / harder when we make the write size smaller)

(And we may need to guarantee that you've succesfully committed to a disk a block at least k heights back for TM. Should generally be fine to just store the state diffs quickly, and re-merkelize on the fly for resync though)

This feels like a great idea to me!

yihuang · 2023-05-16T07:53:20Z

There me be a bit more complexity in replay, and knowing which block to start from on restart! (And tracking which blocks are fully committed -- this becomes more important / harder when we make the write size smaller)

we can rely on the status of latest commit info of the rootmulti for the last fully committed block, which is always written after all the stores.

yihuang · 2023-05-16T07:55:44Z

~~The main concern for now is if the current IAVL implementation concurrency safe for this use case.~~

Thread-safty is not the concern, but the mutex is, it won't be able to achieve better throughput if the reading is blocked by the writing. I think we can make the changes in sdk first, @cool-develope can optimize the iavl later, memiavl can take full advantage of this.

yihuang · 2023-05-16T08:34:15Z

I feel this could be similar to what we'll do in the optimistic execution, where the commit is not "actual" commit.

Closes: cosmos#16173

alexanderbez · 2023-05-16T13:50:01Z

Love this idea @yihuang -- thanks for opening a PR too!

yihuang · 2023-05-17T03:04:36Z

Unforturntely it don't work with current IAVL I think, the main issue is the WorkingHash don't modify current version, so if the next Commit happens before the background SaveVersion, it'll use the old version.
We'll need some refactoring in iavl to take advantage of this. @cool-develope

tac0turtle · 2023-05-17T07:05:17Z

while we cant do multiple iavl tree commits, we could do 1, where we return commit to comet right away and save asynchronous. This would help in a single case and would still have performance benefits

yihuang · 2023-05-17T08:34:13Z

while we cant do multiple iavl tree commits, we could do 1, where we return commit to comet right away and save asynchronous. This would help in a single case and would still have performance benefits

yeah, that is possible, something like this?

// both channels are unbuffered
commitChan := make(chan int64)
commitResultChan := make(chan error)

func Commit() {
  // make sure the last async commit finished
  err := <- commitResultChan
  // calculate root hash for the new commit
  commitID := store.WorkingCommitID()
  // submit new commit request
  commitChan <- commitID.Version
  return commitID
}

// async commit worker
go func() {
  for v := range commitChan {
    _, _, err := store.SaveVersion()
    commitResultChan <- err
  }
}()

tac0turtle · 2023-05-17T08:45:36Z

yea that seems right, it would be interesting to see what sort of speed up that would give. I dont forsee this helping too much on sync but at the head of the chain it could help

ValarDragon · 2023-05-17T09:06:54Z

I think on return to comet, we need to be sure we could replay to this block on our own. Which means we need to synchronously commit to the state diff, but not the merkelization

ValarDragon · 2023-05-17T09:08:24Z

@tac0turtle are you imagining that during sync, your not committing to disk at all? (Or only committing at "wide" intervals, e.g. snapshots)

otherwise not sure I follow how this wouldn't help? Your letting there be multiple db txs in parallel for writes to disk, so committing is happening in parallel to CPU processing of nodes (whereas today you don't use CPU & commit at once)

yihuang · 2023-05-17T09:12:15Z

I think on return to comet, we need to be sure we could replay to this block on our own

I think we don't need to make sure that, as long as the cometbft can replay the missing blocks from block store on abci handshake.

not committing to disk at all? (Or only committing at "wide" intervals, e.g. snapshots

this won't be an archive node though, some node just need to provide full service at every historical versions, I guess.

cool-develope · 2023-05-17T14:16:16Z

Unforturntely it don't work with current IAVL I think, the main issue is the WorkingHash don't modify current version, so if the next Commit happens before the background SaveVersion, it'll use the old version. We'll need some refactoring in iavl to take advantage of this. @cool-develope

yeah, IMO we should separate the iavl into three parts, WorkingHash, Commit , SaveStorage
in Commit stage, just finalize the iavl version in memory and nodes will be stored background in SaveStorage.
it won't break any APIs of SDK

tac0turtle · 2023-05-17T20:26:49Z

this is very similar to the work aditya did on iavl a few years ago which was reverted, we should look at that work to see what we can reuse. A good time to dive into that is after we get a decent portion of the way through store v2 rewrite

ValarDragon · 2023-05-18T10:21:32Z

not sure that prior IAVL work is helpful, this all becomes much cleaner write-side with the new IAVL storage format. (Every version is independent writes!)

tac0turtle · 2024-10-01T09:01:52Z

iavlv2 takes advantage of this as well. Ill close this as the vision is to adopt that version in the coming months

yihuang added the T:feature-request label May 16, 2023

yihuang mentioned this issue May 16, 2023

cosmos-sdk don't have to sync db for each block #14966

Closed

yihuang added a commit to yihuang/cosmos-sdk that referenced this issue May 16, 2023

feat: async commit

1e9d871

Closes: cosmos#16173

yihuang mentioned this issue May 16, 2023

feat: async commit #16175

Closed

19 tasks

yihuang changed the title ~~[Feature]: save iavl tree asynchronously to improve throughput greatly~~ [Feature]: save iavl tree asynchronously to improve throughput May 16, 2023

yihuang mentioned this issue May 17, 2023

[Feature]: add a Close method to App for graceful quit #16192

Closed

tac0turtle added this to Cosmos-SDK Nov 13, 2023

github-project-automation bot moved this to 👀 To Do in Cosmos-SDK Nov 13, 2023

tac0turtle added the C:Store label Feb 20, 2024

tac0turtle removed this from Cosmos-SDK Apr 21, 2024

tac0turtle closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: save iavl tree asynchronously to improve throughput #16173

[Feature]: save iavl tree asynchronously to improve throughput #16173

yihuang commented May 16, 2023 •

edited

Loading

ValarDragon commented May 16, 2023

yihuang commented May 16, 2023 •

edited

Loading

yihuang commented May 16, 2023 •

edited

Loading

yihuang commented May 16, 2023 •

edited

Loading

alexanderbez commented May 16, 2023

yihuang commented May 17, 2023 •

edited

Loading

tac0turtle commented May 17, 2023

yihuang commented May 17, 2023 •

edited

Loading

tac0turtle commented May 17, 2023

ValarDragon commented May 17, 2023

ValarDragon commented May 17, 2023

yihuang commented May 17, 2023

cool-develope commented May 17, 2023 •

edited

Loading

tac0turtle commented May 17, 2023

ValarDragon commented May 18, 2023

tac0turtle commented Oct 1, 2024

[Feature]: save iavl tree asynchronously to improve throughput #16173

[Feature]: save iavl tree asynchronously to improve throughput #16173

Comments

yihuang commented May 16, 2023 • edited Loading

Summary

Problem Definition

Proposal

ValarDragon commented May 16, 2023

yihuang commented May 16, 2023 • edited Loading

yihuang commented May 16, 2023 • edited Loading

yihuang commented May 16, 2023 • edited Loading

alexanderbez commented May 16, 2023

yihuang commented May 17, 2023 • edited Loading

tac0turtle commented May 17, 2023

yihuang commented May 17, 2023 • edited Loading

tac0turtle commented May 17, 2023

ValarDragon commented May 17, 2023

ValarDragon commented May 17, 2023

yihuang commented May 17, 2023

cool-develope commented May 17, 2023 • edited Loading

tac0turtle commented May 17, 2023

ValarDragon commented May 18, 2023

tac0turtle commented Oct 1, 2024

yihuang commented May 16, 2023 •

edited

Loading

yihuang commented May 16, 2023 •

edited

Loading

yihuang commented May 16, 2023 •

edited

Loading

yihuang commented May 16, 2023 •

edited

Loading

yihuang commented May 17, 2023 •

edited

Loading

yihuang commented May 17, 2023 •

edited

Loading

cool-develope commented May 17, 2023 •

edited

Loading