-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Implement a UTXO cache #1373
Conversation
Ideally we would only rollback the state when we had a hard shutdown in the middle of a flush but since the underlying db also has a cache that might not write our flushing state if there's a hard shutdown, we might not be able to detect that we shutdown in the middle of a flush. So to make sure we always restore properly we will always rollback the blockchain when the state is inconsistent.
We hit a bug with the following setup. Block 559307(a) was mined creating a new utxo (u). An alternate block 559307(b) was mined which did not create u. Subsequently 559308 was mined on top of b which did create u again. So during the reorg u needed to go from unspent to spent(removed) when block a was disconnected, then back to unspent again when block 559308 was connected. However, the utxocache commit() function did not have the ability to override a utxo previously marked as spent back into an unspent state and thus u was left marked as spent. When 559309 was mined spending u, it failed to validate because u was erroneously marked as spent already. This commit patches the commit() function to allow overriding from spent to unspent.
This fixes a couple bugs in the UTXO set reconstruction code. Specifically the roll forward starts one block too early resulting in a UTXO missing error. Also it was missing a break from the final rollforward loop which caused it to remain stuck forever.
Tests fail as is, they also failed before of the cherry-picked "fix" commits. The set of extra commits actually seem to introduce even more test failures than without them. |
Tests should be passing now after that lastest commit. I don't like the fix in ee0b302 though, it should be much more precise. |
In this commit, we pre-allocate our cache to hold the maximum number of entries possible. We do this for two reasons: 1. Go maps do not shrink when items are deleted from there, so we'll end up with a map of this size anyway. 2. By doing our large allocation, we avoid doing many smaller allocations causing more GC pressure as we ramp up to our max cache size. To estimate the max size of the cache, we use the length of a p2pkh script since they're larger than p2wkh scripts, and p2wkh isn't as widespread yet.
|
||
const ( | ||
// FlushRequired is the flush mode that means a flush must be performed | ||
// regardless of the cache state. For example right before shutting down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// regardless of the cache state. For example right before shutting down. | |
// regardless of the cache state. For example right before shutting down. |
FlushRequired FlushMode = iota | ||
|
||
// FlushPeriodic is the flush mode that means a flush can be performed | ||
// when it would be almost needed. This is used to periodically signal when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// when it would be almost needed. This is used to periodically signal when | |
// when it would be almost needed. This is used to periodically signal when |
|
||
// FlushIfNeeded is the flush mode that means a flush must be performed only | ||
// if the cache is exceeding a safety threshold very close to its maximum | ||
// size. This is used mostly internally in between operations that can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// size. This is used mostly internally in between operations that can | |
// size. This is used mostly internally in between operations that can |
@@ -848,7 +848,7 @@ func getWitnessSigOps(pkScript []byte, witness wire.TxWitness) int { | |||
} | |||
|
|||
// IsUnspendable returns whether the passed public key script is unspendable, or | |||
// guaranteed to fail at execution. This allows inputs to be pruned instantly | |||
// guaranteed to fail at execution. This allows outputs to be pruned instantly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// guaranteed to fail at execution. This allows outputs to be pruned instantly | |
// guaranteed to fail at execution. This allows outputs to be pruned instantly |
Any reason why the cache is limited to 250MB, just curious :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
} | ||
|
||
// The search strategy used is to exponentially look back until the ancestor | ||
// matches (or they are nil, in which case we reached the genesis block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// matches (or they are nil, in which case we reached the genesis block. | |
// matches (or they are nil, in which case we reached the genesis block). |
// ancestors of the two given nodes. | ||
func findHighestCommonAncestor(node1, node2 *blockNode) *blockNode { | ||
// Since the common ancestor cannot be higher than the lowest node, put | ||
// the higher one to it's ancestor at the lower one's height. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the higher one to it's ancestor at the lower one's height. | |
// the higher one to its ancestor at the lower one's height. |
// Commit all modifications made to the view into the utxo state. This also | ||
// prunes these changes from the view. | ||
b.stateLock.Lock() | ||
b.utxoCache.Commit(view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not catch the error here as well? 😃
b.utxoCache.Commit(view) | |
if err := b.utxoCache.Commit(view); err != nil { | |
log.Errorf("error committing block %s(%d) to utxo cache: %s", block.Hash(), block.Height(), err.Error()) | |
} | |
@@ -1794,10 +1832,26 @@ func New(config *Config) (*BlockChain, error) { | |||
return nil, err | |||
} | |||
|
|||
bestNode := b.bestChain.Tip() | |||
bestNode = b.bestChain.Tip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant after addition of line 1816?
// cached in the process. | ||
// | ||
// This method should be called with the state lock held. | ||
func (s *utxoCache) seekAndCacheEntries(ops ...wire.OutPoint) ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reduce code duplication between seekAndCacheEntries
and fetchAndCacheEntries
? Actually, I'm not exactly sure where fetchAndCacheEntries
is used.
@jcvernaleo (as per #1530)
|
@Roasbeef can we make this a draft? |
Wouldn't say the PR is outdated since it runs as is, and delivers a massive speed up in IBD. As mentioned in the recently closed (older) version of this PR, in combination with some of the other optimizations, |
@Roasbeef I think we may have been using This PR is definitely super important so I'm really looking forward to when it can go in. |
Yep, that was the case. Will go ahead and run this on a full sync today to benchmark |
Is the bug in |
In case anyone is interested when picking up this work, we implemented a UTXO Cache in |
Closing in favor of #1955 |
This PR picks off where #1168 left off. I've back ported fixes from the
gcash
repo, ensuring all commits have their authors properly attributed. I've started to test this PR as is on both mainnet and testnet. We also need to examine the set of fixes and ensure they're absolutely necessary, as many of the back-portedgcash
fixes didn't have accompanying unit tests that failed before the fix was applied.Additionally, I plan to fix what I view as two design flaws in the current UTXO set cache:
Thanks to everyone that had a part in this including @stevenroose, @cfromknecht and @cpacia.