-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sequencer: Cache compact.Tree between sequencing runs #1598
Conversation
@Martin2112 @AlCutter What do you think about this idea? |
Codecov Report
@@ Coverage Diff @@
## master #1598 +/- ##
=========================================
Coverage ? 67.14%
=========================================
Files ? 111
Lines ? 8988
Branches ? 0
=========================================
Hits ? 6035
Misses ? 2347
Partials ? 606
Continue to review full report at Codecov.
|
Caching is a consequence of |
It may be time for some history from the dim and distant past of those long-forgotten days in 2016 ... Not retaining state was done for several reasons. 1.) Not holding something that could get corrupted in RAM 2.) Reduced need to think about various complicated distributed failure modes. 3.) Limited time and resources to implement made us pretty cautious. So fast forward to the glorious days of 2019. The kind of questions that should be asked are whether this opens up any possible ways to go wrong that don't currently exist. I think the versioning protects against most stale database / accidental multiple master cases but it's worth thinking it through with worst case assumptions. If we're going to do this I think it should check that what comes out of the cache appears to be the same as what went in. Also what happens if time on the machine appears to have gone backwards? I think there is a check somewhere for writing an older than current root but not sure it covers this. It might be possible make it either write an incorrect timestamp or get stuck failing to update the root until it resigns and a new signer takes over. |
This is a good point. In this PR I check tree size and root hash of the
This is an orthogonal question, as the cache only stores hashes. This PR doesn't change behavior w.r.t. timestamps. |
Yes it currently checks what it uses so could be ok but otherwise it would be safer to hang on to the serialized format and verify each time it's used. That may be overkill. The timestamp thing is an example as there's a timestamp in It's not a bad idea. We just need to make sure we've thought through how it might break. If it can't that's great but we have to do at least do it. |
log/cache.go
Outdated
// | ||
// TODO(pavelkalinnikov): Support some cache clearing mechanism. | ||
type TreeCache struct { | ||
ct map[int64]*compact.Tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to use sync.Map
here, or otherwise protect with sync.Mutex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic point.
log/sequencer.go
Outdated
@@ -401,7 +405,7 @@ func (s Sequencer) IntegrateBatch(ctx context.Context, tree *trillian.Tree, limi | |||
} | |||
|
|||
// Collate node updates. | |||
nodeMap, err := s.updateCompactTree(merkleTree, sequencedLeaves, label) | |||
nodeMap, err := s.updateCompactTree(merkleTree, tree.TreeId, sequencedLeaves, label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally you probably only want this to happen once the tx
has successfully committed?
It's probably fine because on a rollback the replayed root hash won't match the cached one so you'll end up having to init the cache from storage, but ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was precisely my thinking. But it's probably still not a bad idea to make it safer. OTOH, it's not trivial: note that the returned compact.Tree
is mutable, and we can't rollback these updates without some sort of "clone" function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This triggers another interesting observation: this code can be retried, as it's within a RunInTransaction
function. If it is retried, the 1st retry will fail (and clear the cache) because of compact.Tree
mutability and root hash mismatch, but the 2nd retry has a chance of succeeding again. This is probably a good reason to have a "clone" function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, ignore the last comment. The 1st retry will not fail, it will only be forced to reload the compact.Tree
from storage.
Closing in favor of a larger refactoring: #1640 |
This change adds
compact.Tree
caching to sequencer. This should significantlyreduce cost of sequencing, because most of the time sequencer does many runs
in a row.
Checklist