-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 8 bytes to store int64 components of database keys #107
Conversation
This is an incredible performance win! I'm especially glad to see that the source of the large memory allocations is solved! |
@@ -17,7 +17,7 @@ func TestTreeGetWithProof(t *testing.T) { | |||
require := require.New(t) | |||
for _, ikey := range []byte{0x11, 0x32, 0x50, 0x72, 0x99} { | |||
key := []byte{ikey} | |||
tree.Set(key, []byte(rand.Str(8))) | |||
tree.Set(key, []byte(random.Str(8))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this rename because rand
clashes with standard math/rand
which I wanted to use. I could of aliased by I thought it was better to avoid clashing with stdlib (maybe better still to not use a global...? (even better to perhaps do without tendermint/common :o - but I should tread lightly for now :) )).
for i := 0; i < 1000000; i++ { | ||
t.Set(i2b(int(cmn.RandInt32())), nil) | ||
t.Set(i2b(int(cmn.RandInt32())), value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go test -bench=.
was panicking on develop without this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linking #76
ca4bc10
to
dc76584
Compare
@ValarDragon negative gives should be fine - I've added a test. |
28d5706
to
d125383
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks great to me! Using this KeyFormat makes the code much cleaner / more readable.
However I'm unfamiliar with alot of the internals of iavl, so I'll leave it to @jlandrews, @liamsi to approve.
a2a8ab4
to
d265b7d
Compare
Just want to note, this performance increase is actually super helpful for our state machine as well. Some benchmarks I was doing had the sprintf calls from GetNode and SaveAs as taking up 1.5% of the entire state machines time. (Includes goleveldb compaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome pull request. Thanks a lot @silasdavis 👍
Left a few nits/questions in the review. Will merge tonight.
key_format.go
Outdated
} | ||
} | ||
|
||
// Format the byte segments into the key format - will panic if the segment lengths to do match the layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will panic if the segment lengths to not match the layout.
return string([]byte{kf.prefix}) | ||
} | ||
|
||
func scan(a interface{}, value []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this to scanValue. Probably just a matter of preferences.
key_format.go
Outdated
return key[:n] | ||
} | ||
|
||
// Format the args passed into the key format - will panic if the arguments passed to not match the length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will panic if the arguments passed do not match the length
key_format.go
Outdated
binary.BigEndian.PutUint64(bs, uint64(v)) | ||
return bs | ||
case int: | ||
return format(int64(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't all these non-byte cases be simplified to:
case uint64, uint, int64, int:
bs := make([]byte, 8)
binary.BigEndian.PutUint64(bs, uint64(v))
return bs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I read that I thought, oh really, how did I not know you could do that, but sadly you cannot. The compile-time type of v
becomes interface{}
as soon as you combine types which cannot be type converted.
You must have been confusing this with a language that has type system :)
I can make it a bit more consistent though:
func format(a interface{}) []byte {
switch v := a.(type) {
case uint64:
bs := make([]byte, 8)
binary.BigEndian.PutUint64(bs, v)
return bs
case uint:
return format(uint64(v))
case int64:
return format(uint64(v))
case int:
return format(uint64(v))
case []byte:
return v
default:
panic(fmt.Errorf("KeyFormat format() does not support formatting value of type %T: %v", a, a))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to do recursive calls, but instead have a second "formatuint64" function which all of these sub-cases call. A second interface conversions seems like an unnecessary performance hit, given that this may be called a ton.
key_format.go
Outdated
} | ||
|
||
// Create a []byte key format based on a single byte prefix and fixed width key segments each of whose length is | ||
// specified by by the corresponding element of layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, a little example would be great here?
for i := 0; i < 1000000; i++ { | ||
t.Set(i2b(int(cmn.RandInt32())), nil) | ||
t.Set(i2b(int(cmn.RandInt32())), value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linking #76
key_format.go
Outdated
|
||
// Format the args passed into the key format - will panic if the arguments passed to not match the length | ||
// of the segment to which they correspond. When called with no arguments returns the raw prefix (useful as a start | ||
// element of the entire keys space when sorted lexicographically) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comments should end with a full stop (here and everywhere else). Don't bother, I'll add these.
key_format.go
Outdated
case *int64: | ||
*v = int64(binary.BigEndian.Uint64(value)) | ||
case *uint64: | ||
*v = binary.BigEndian.Uint64(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Maybe I'm overlooking sth obvious here but why are there only two cases (int64
and uint64
) when reading in the value? When formatting the we allow for uint64
, uint
, int64
, int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's convenient to be able to call kf.Key(0, 1, 2)
and not have to cast each argument, kf.Key(0)
is the example that exists, and in tests.
key_format.go
Outdated
func scan(a interface{}, value []byte) { | ||
switch v := a.(type) { | ||
case *int64: | ||
*v = int64(binary.BigEndian.Uint64(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this still behave as intended when this overflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's fine the MSB just becomes the sign bit - I've added a test to demonstrate. Provided the user is consistent with using int64 for signed numbers.
@liamsi I'll make these changes shortly - thanks. |
Signed-off-by: Silas Davis <[email protected]>
…ids string manipulatio/scanning
100ed1e
to
8290d65
Compare
@liamsi changes done |
Awesome! Merging and will prep a release this week. |
See full changelog here: https://github.com/tendermint/iavl/blob/develop/CHANGELOG.md#0110-september-7-2018 * Update to CircleCI 2.0 (#108) * Use 8 bytes to store int64 components of database keys (#107) * Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning * Add node key and orphan key benchmark * Fix various issue from PR: punctuation, add overflow test, and improve scan function * Remove architecture dependent getter functions (#96) * Remove architecture dependent getter functions * update changelog * Prep Release 0.11.0 (#111) * Bump version and update change-log
* 'develop' of github.com:danil-lashin/iavl: Prep Release 0.11.0 (cosmos#111) Remove architecture dependent getter functions (cosmos#96) Use 8 bytes to store int64 components of database keys (cosmos#107) Update to CircleCI 2.0 (cosmos#108) Release 0.10.0: Update Changelog and bump version (cosmos#99) delete empty file (relict from merging master into develop) Mutable/Immutable refactor and GetImmutable snapshots (cosmos#92) Remove unreachable code Remove unused variable dep: Change tendermint dep to be ^v0.22.0 (cosmos#91) Fix benchmark scripts for current code (cosmos#89) release v0.9.2 (cosmos#82) Various lints (cosmos#80) Jae/rangeprooffix (cosmos#75)
* Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning * Add node key and orphan key benchmark * Fix various issue from PR: punctuation, add overflow test, and improve scan function
See full changelog here: https://github.com/tendermint/iavl/blob/develop/CHANGELOG.md#0110-september-7-2018 * Update to CircleCI 2.0 (cosmos#108) * Use 8 bytes to store int64 components of database keys (cosmos#107) * Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning * Add node key and orphan key benchmark * Fix various issue from PR: punctuation, add overflow test, and improve scan function * Remove architecture dependent getter functions (cosmos#96) * Remove architecture dependent getter functions * update changelog * Prep Release 0.11.0 (cosmos#111) * Bump version and update change-log
This fixes #104 by introducing
KeyFormat
that uses a fixed-width key format over the full 8 bytes provided by int64 versions in orphan, and root keys. It also relies on the fixed width format to provide more efficient code for formatting into and scanning values from keys.As a result of this it seems to lead to a ~10% speedup in
BenchmarkTreeLoadAndDelete
by avoiding the relatively expensivefmt.Sprintf
andfmt.Sscanf
functions, see below.Before
KeyFormat
introductiongo test -bench=. goos: linux goarch: amd64 pkg: github.com/tendermint/iavl BenchmarkNodeKey-8 5000000 316 ns/op BenchmarkOrphanKey-8 2000000 613 ns/op ok, starting BenchmarkImmutableAvlTreeMemDB-8 200 5783362 ns/op 1104190 B/op 22488 allocs/op BenchmarkTreeLoadAndDelete/LoadAndDelete-8 2 707800784 ns/op PASS ok github.com/tendermint/iavl 129.478s
After
KeyFormat
introductiongo test -bench=. goos: linux goarch: amd64 pkg: github.com/tendermint/iavl BenchmarkNodeKey-8 30000000 50.9 ns/op BenchmarkOrphanKey-8 5000000 256 ns/op BenchmarkImmutableAvlTreeMemDB-8 500 4048465 ns/op 82591 B/op 1546 allocs/op BenchmarkTreeLoadAndDelete/LoadAndDelete-8 5 641415439 ns/op PASS ok github.com/tendermint/iavl 121.982s
For
BenchmarkTreeLoadAndDelete/LoadAndDelete-8
we have 1 - 641415439/707800784 ~= 9.3%