Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement a UTXO cache #1373

Closed
wants to merge 21 commits into from
Closed

Conversation

Roasbeef
Copy link
Member

This PR picks off where #1168 left off. I've back ported fixes from the gcash repo, ensuring all commits have their authors properly attributed. I've started to test this PR as is on both mainnet and testnet. We also need to examine the set of fixes and ensure they're absolutely necessary, as many of the back-ported gcash fixes didn't have accompanying unit tests that failed before the fix was applied.

Additionally, I plan to fix what I view as two design flaws in the current UTXO set cache:

  1. When entries are added, we don't evict them in real time. As a result, the cache can grow very large beyond the targeted constrained sized. In the future, we'll likely also implement more advanced eviction logic, such as evicting the oldest present UTXO, as empirically, UTXOs die young.
  2. When we flush the cache, we also remove all in memory entries from the cache. This causes additional G/C pressure, as the very next block, we then need to re-allocate the map. Additionally, this causes more cache hits, as we start with a cold cache for the next block which guarantees that we'll have a large set of cache misses. Instead, we should only seek to sync our in memory view with that what's on disk, only syncing the entires that have been modified.

Thanks to everyone that had a part in this including @stevenroose, @cfromknecht and @cpacia.

stevenroose and others added 16 commits January 16, 2019 16:54
Ideally we would only rollback the state when we had a hard shutdown in
the middle of a flush but since the underlying db also has a cache that
might not write our flushing state if there's a hard shutdown, we might
not be able to detect that we shutdown in the middle of a flush. So to
make sure we always restore properly we will always rollback the
blockchain when the state is inconsistent.
We hit a bug with the following setup. Block 559307(a) was mined
creating a new utxo (u). An alternate block 559307(b) was mined which
did not create u. Subsequently 559308 was mined on top of b which did
create u again. So during the reorg u needed to go from unspent to
spent(removed) when block a was disconnected, then
back to unspent again when block 559308 was connected. However, the
utxocache commit() function did not have the ability to override a utxo
previously marked as spent back into an unspent state and thus u was
left marked as spent. When 559309 was mined spending u, it failed to
validate because u was erroneously marked as spent already.

This commit patches the commit() function to allow overriding from spent
to unspent.
This fixes a couple bugs in the UTXO set reconstruction code.
Specifically the roll forward starts one block too early resulting in a
UTXO missing error.

Also it was missing a break from the final rollforward loop which caused
it to remain stuck forever.
@Roasbeef Roasbeef mentioned this pull request Jan 17, 2019
2 tasks
@Roasbeef
Copy link
Member Author

Tests fail as is, they also failed before of the cherry-picked "fix" commits. The set of extra commits actually seem to introduce even more test failures than without them.

@Roasbeef
Copy link
Member Author

Tests should be passing now after that lastest commit. I don't like the fix in ee0b302 though, it should be much more precise.

@Roasbeef Roasbeef changed the title WIP] Implement a UTXO cache [WIP] Implement a UTXO cache Jan 17, 2019
@Roasbeef
Copy link
Member Author

Started the mainnet sync, ton of optimization to do, most of the time is spent in GC:
screen shot 2019-01-16 at 9 22 12 pm

@Roasbeef
Copy link
Member Author

Zooming in on the other hot area, we see the current bottle necks in block processing:
screen shot 2019-01-16 at 9 22 04 pm

In this commit, we pre-allocate our cache to hold the maximum number of
entries possible. We do this for two reasons:

  1. Go maps do not shrink when items are deleted from there, so we'll
  end up with a map of this size anyway.

  2. By doing our large allocation, we avoid doing many smaller
  allocations causing more GC pressure as we ramp up to our max cache
  size.

To estimate the max size of the cache, we use the length of a p2pkh
script since they're larger than p2wkh scripts, and p2wkh isn't as
widespread yet.

const (
// FlushRequired is the flush mode that means a flush must be performed
// regardless of the cache state. For example right before shutting down.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// regardless of the cache state. For example right before shutting down.
// regardless of the cache state. For example right before shutting down.

FlushRequired FlushMode = iota

// FlushPeriodic is the flush mode that means a flush can be performed
// when it would be almost needed. This is used to periodically signal when
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// when it would be almost needed. This is used to periodically signal when
// when it would be almost needed. This is used to periodically signal when


// FlushIfNeeded is the flush mode that means a flush must be performed only
// if the cache is exceeding a safety threshold very close to its maximum
// size. This is used mostly internally in between operations that can
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// size. This is used mostly internally in between operations that can
// size. This is used mostly internally in between operations that can

@@ -848,7 +848,7 @@ func getWitnessSigOps(pkScript []byte, witness wire.TxWitness) int {
}

// IsUnspendable returns whether the passed public key script is unspendable, or
// guaranteed to fail at execution. This allows inputs to be pruned instantly
// guaranteed to fail at execution. This allows outputs to be pruned instantly
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// guaranteed to fail at execution. This allows outputs to be pruned instantly
// guaranteed to fail at execution. This allows outputs to be pruned instantly

@maguayo
Copy link

maguayo commented Jan 24, 2019

Any reason why the cache is limited to 250MB, just curious :)

Copy link
Contributor

@yashbhutwala yashbhutwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

}

// The search strategy used is to exponentially look back until the ancestor
// matches (or they are nil, in which case we reached the genesis block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// matches (or they are nil, in which case we reached the genesis block.
// matches (or they are nil, in which case we reached the genesis block).

// ancestors of the two given nodes.
func findHighestCommonAncestor(node1, node2 *blockNode) *blockNode {
// Since the common ancestor cannot be higher than the lowest node, put
// the higher one to it's ancestor at the lower one's height.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the higher one to it's ancestor at the lower one's height.
// the higher one to its ancestor at the lower one's height.

// Commit all modifications made to the view into the utxo state. This also
// prunes these changes from the view.
b.stateLock.Lock()
b.utxoCache.Commit(view)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not catch the error here as well? 😃

Suggested change
b.utxoCache.Commit(view)
if err := b.utxoCache.Commit(view); err != nil {
log.Errorf("error committing block %s(%d) to utxo cache: %s", block.Hash(), block.Height(), err.Error())
}

@@ -1794,10 +1832,26 @@ func New(config *Config) (*BlockChain, error) {
return nil, err
}

bestNode := b.bestChain.Tip()
bestNode = b.bestChain.Tip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant after addition of line 1816?

// cached in the process.
//
// This method should be called with the state lock held.
func (s *utxoCache) seekAndCacheEntries(ops ...wire.OutPoint) (
Copy link
Contributor

@yashbhutwala yashbhutwala Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reduce code duplication between seekAndCacheEntries and fetchAndCacheEntries? Actually, I'm not exactly sure where fetchAndCacheEntries is used.

@jakesylvestre
Copy link
Collaborator

@jcvernaleo (as per #1530)

  • Low priority
  • Outdated

@jakesylvestre
Copy link
Collaborator

@Roasbeef can we make this a draft?

@Roasbeef
Copy link
Member Author

Roasbeef commented Mar 6, 2020

Wouldn't say the PR is outdated since it runs as is, and delivers a massive speed up in IBD. As mentioned in the recently closed (older) version of this PR, in combination with some of the other optimizations, btcd sync the chain in under 24 hours. This includes full verification after the past checkpoint, unlike bitcoind's assumevalid which only does UTXO ingestion primarily.

@jcvernaleo
Copy link
Member

@Roasbeef I think we may have been using outdated to mean more like a catch-all for either not up to date or just not ready. Not a great way to do it for sure, but we had a lot of triage to do :)

This PR is definitely super important so I'm really looking forward to when it can go in.

@jakesylvestre
Copy link
Collaborator

Yep, that was the case. Will go ahead and run this on a full sync today to benchmark

@kcalvinalvin
Copy link
Collaborator

Is the bug in dbSeekUtxoEntry() already known? In the tests, dbSeekUtxoEntry() will sometimes return a tx that's new. This causes checkBIP0030() to fail as seekAndCacheEntries() will return a tx that hasn't been committed with Commit() method on utxoCache. Using fetchAndCacheEntries() is a workaround to this.

@rstaudt2
Copy link

In case anyone is interested when picking up this work, we implemented a UTXO Cache in dcrd earlier this year, and portions of it could likely be pretty easily reused here. It accounts for the design flaws mentioned in this PR description as well. The PR that introduces it is decred/dcrd#2591.

@Roasbeef
Copy link
Member Author

Roasbeef commented Mar 1, 2023

Closing in favor of #1955

@Roasbeef Roasbeef closed this Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.