-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast storage optimization for queries and iterations #5
Conversation
mutable_tree.go
Outdated
// GetVersionedFast gets the value at the specified key and version. The returned value must not be | ||
// modified, since it may point to data stored within IAVL. GetVersionedFast utilizes a more performant | ||
// strategy for retrieving the value than GetVersioned but falls back to regular strategy if fails. | ||
func (tree *MutableTree) GetVersionedFast(key []byte, version int64) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar reasoning to this
@@ -0,0 +1,66 @@ | |||
root@ubuntu-s-1vcpu-1gb-nyc1-01:~/iavl# cat bench_fast.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left here for reviewers, will remove before merge
Nice job on this! Also thanks for committing all the benchmarks! The CacheHit query speedup is amazing! Haven't done a full review yet, but I was a bit surprised by why CacheMiss was significantly slower. I looked through the code, and I think the existing benchmark is more accurately "query not guaranteed to be in tree", and is incorrectly named. One other thing we can do to make this faster, is that in immutable tree, if immutable tree version = node db latest version, then FastNode Miss implies that the key isn't in that version, so no reason to do the full search. (That way all queries against latest state are getting the speedup) |
Thanks for the feedback. Yes, I agree with the naming and will change it to reflect that. Great suggestion on making it faster! I'll benchmark it again with the fix |
I implemented the suggested fix with the latest commit. It has significantly improved all queries. However, updates and blocks are worse off now. The fix made me discover that I should have been updating the version for fast nodes on disk to be above the deleted version when calling a variation of This is done to keep fast nodes on the disk to be consistent with the live state. If we were to simply delete the fast node from disk on deletion, then it would be impossible to implement your suggestion as the live state would diverge from fast node state. The latest bench test got OOM killed for the last |
versionLastUpdatedAt: ver, | ||
value: val, | ||
} | ||
|
||
return fastNode, nil | ||
} | ||
|
||
func (node *FastNode) encodedSize() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same method of regular node here looks like the following:
func (node *Node) encodedSize() int {
n := 1 +
encodeVarintSize(node.size) +
encodeVarintSize(node.version) +
encodeBytesSize(node.key)
if node.isLeaf() {
n += encodeBytesSize(node.value)
} else {
n += encodeBytesSize(node.leftHash) +
encodeBytesSize(node.rightHash)
}
return n
}
I don't understand what is that extra byte in the beginning for. At first, I though that it might be used for prefix but it looks like encodedSize
is only used for measuring the value. I don't think we need an extra byte for the null terminator since we encode each field with its own terminator. Could someone explain what is that extra byte meant for, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I have no idea either. If thing are serializing + deserializing correctly without it, then it should be fine I hope?
(Its very possible this was just always an error, super under-documented code base with up until recently a bus factor of one)
|
||
return tree.Hash(), version, nil | ||
} | ||
|
||
func (tree *MutableTree) saveFastNodeVersion() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to look into the following:
- if removing before writing to db is faster
- if combining sorted removals with sorted additions is faster than doing them one after the other
If anyone knows, what patterns are the most efficient or how I can optimize this better, please let me know
@@ -11,7 +12,7 @@ type FastNode struct { | |||
key []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt about removing key? Think we should keep it or remove it to help reduce the committed data overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the key from being written as a value of the fast node on the disk. We still need to keep it as a member of the struct for retrieval from disk and managing it in memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh awesome, great trade-off taken!
// MockDB is a mock of DB interface. | ||
type MockDB struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats mockdb / gomock btw? I'm just not familiar
Seems pretty cool / like its code generator for something that satisfies an interface, but you can easily edit to tweak things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It creates a mock implementation of an interface. Here, I created a mock of the database interface - MockDb
. It allows controlling the behavior of the mock to enter any code branch for test coverage. I mostly used it to simulate DB errors in this PR which were difficult to simulate with the mem DB that we normally use for testing
testutils_test.go
Outdated
increment *= -1 | ||
} | ||
|
||
for startIdx < endIdx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work when !ascending, doesn't startIdX = len(mirror) - 1, and endIdX = 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was completely wrong. Thanks for catching that. Fixed now
…des and orphans, update unit tests
Co-authored-by: Dev Ojha <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
…12) * add leaf hash to fast node and unit test * refactor get with index and get by index, fix migration in load version and lazy load version * use Get in GetVersioned of mutable tree * refactor non membership proof to use fast storage if available * bench non-membership proof * fix bench tests to work with the new changes * add downgrade-reupgrade protection and unit test * remove leaf hash from fast node * resolve multithreading bug related to iterators not being closed * clean up * use correct tree in bench tests * add cache to tree used to bench non membership proofs * add benc tests for GetWithIndex and GetByIndex * revert GetWithIndex and GetByIndex * remove unused import * unit test re-upgrade protection and fix small issues * remove redundant setStorageVersion method * fix bug with appending to live stage version to storage version and nit test * add comment for setFastStorageVersionToBatch * refactor and improve unit tests for reupgrade protection * rename ndb's isFastStorageEnabled to hasUpgradedToFastStorage and add comments * comment out new implementation for GetNonMembershipProof * update comments in nodedb to reflect the difference between hasUpgradedToFastStorage and shouldForceFastStorageUpdate * refactor nodedb tests * downgrade tendermint to 0.34.14 - osmosis's latest cosmos sdk does not support 0.35.0 * fix bug where fast storage was not enabled when version 0 was attempted to be loaded * implement unsaved fast iterator to be used in mutable tree (#16)
* expose isUpgradeable method on mutable tree and unit test * go fmt
c983c62
to
f0f815e
Compare
Background
Link to the original spec
We are conceptualizing the fast cache as this direct key-value store for the latest state. For simplicity of deployment/migration logic, our plan is to make this a secondary copy of the latest state in the database. (We do more egregious space overheads with the cosmos pruning strategy, so this is not that bad). The improved data locality on disk should reduce the time needed for retrieving data for keys that are close to each other. As a result, iterating over the latest state in order should become extremely fast since we do not need to read from random physical locations on a disk.
Original Issues
#1 Setting & getting data for the Fast Cache
#3 Iteration over Fast Cache
#4 Migration code
Summary of Changes
IAVL is divided into two trees,
mutable_tree
andimmutable_tree
. Sets only happen on the mutable tree.Things that need to change and be investigated for getting and setting, and the fast node:
mutable tree
GetVersioned
Set
Remove
SaveVersion
Iterate
Iterator
Get
enableFastStorageAndCommit
and its variationsmstorage_version
wherem
is a new prefix. If the version is lower than thefastStorageVersionValue
threshold - migration is triggered.immutable_tree
Get
and(GetWithIndex
Get
toGetWithIndex
.GetWithIndex
always uses the default live state traversal strategyGet
method. Get attempts to use the fast cache first. Only fallbacks to regular tree traversal strategy if the fast cache is disabled or tree is not of the latest versionIterator
nodedb
fast_iterator
f
which stands for fast. Basically, all fast nodes are sorted on disk by key in ascending order so we can simply traverse the disk ensuring efficient hardware access.testing
Old Benchmark
Date:
2022-01-22 12:33 AM PST
Branch:
dev/iavl_data_locality
with some modifications to the bench testsLatest Benchmark
Date:
2022-01-22 10:15 AM PST
Branch:
roman/fast-node-get-set
Benchmarks Interpretation
Highlighting the difference in performance from the latest benchmarks:
Old branch is
dev/iavl_data_locality
New branch is
roman/fast-node-get-set
Initial size:
100,000
key-val pairsBlock size:
100
keysKey length:
16
bytesValue length:
40
bytesQuery with no guarantee of the key being in the tree:
22354 ns/op
18046 ns/op
4938 ns/op
Query with the key guaranteed to be in the latest tree:
27137 ns/op
23126 ns/op
1684 ns/op
Iteration:
2285116100 ns/op
1716585400 ns/op
94702442 ns/op
Update:
run Set, if this is a try that is divisible by blockSize, attempt to SaveVersion and if the latest saved version number history exceeds 20, delete the oldest version
307266 ns/op
257683 ns/op
Block:
for block size, run Get and Set. At the end of the block, SaveVersion and if the latest saved version number history exceeds 20, delete the oldest version
40663600 ns/op
44907345 ns/op