Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 8 bytes to store int64 components of database keys #107

Merged
merged 3 commits into from
Sep 5, 2018

Conversation

silasdavis
Copy link
Contributor

@silasdavis silasdavis commented Aug 30, 2018

This fixes #104 by introducing KeyFormat that uses a fixed-width key format over the full 8 bytes provided by int64 versions in orphan, and root keys. It also relies on the fixed width format to provide more efficient code for formatting into and scanning values from keys.

As a result of this it seems to lead to a ~10% speedup in BenchmarkTreeLoadAndDelete by avoiding the relatively expensive fmt.Sprintf and fmt.Sscanf functions, see below.

Before KeyFormat introduction

go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/tendermint/iavl
BenchmarkNodeKey-8                 	 5000000	       316 ns/op
BenchmarkOrphanKey-8               	 2000000	       613 ns/op
ok, starting
BenchmarkImmutableAvlTreeMemDB-8   	200	   5783362 ns/op	 1104190 B/op	   22488 allocs/op
BenchmarkTreeLoadAndDelete/LoadAndDelete-8         	       2	 707800784 ns/op
PASS
ok  	github.com/tendermint/iavl	129.478s

After KeyFormat introduction

go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/tendermint/iavl
BenchmarkNodeKey-8                 	30000000	        50.9 ns/op
BenchmarkOrphanKey-8               	 5000000	       256 ns/op
BenchmarkImmutableAvlTreeMemDB-8   	     500	   4048465 ns/op	   82591 B/op	    1546 allocs/op
BenchmarkTreeLoadAndDelete/LoadAndDelete-8         	       5	 641415439 ns/op
PASS
ok  	github.com/tendermint/iavl	121.982s

For BenchmarkTreeLoadAndDelete/LoadAndDelete-8 we have 1 - 641415439/707800784 ~= 9.3%

@ValarDragon
Copy link
Contributor

ValarDragon commented Aug 30, 2018

This is an incredible performance win! I'm especially glad to see that the source of the large memory allocations is solved!
Does the key being negative matter?

@@ -17,7 +17,7 @@ func TestTreeGetWithProof(t *testing.T) {
require := require.New(t)
for _, ikey := range []byte{0x11, 0x32, 0x50, 0x72, 0x99} {
key := []byte{ikey}
tree.Set(key, []byte(rand.Str(8)))
tree.Set(key, []byte(random.Str(8)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this rename because rand clashes with standard math/rand which I wanted to use. I could of aliased by I thought it was better to avoid clashing with stdlib (maybe better still to not use a global...? (even better to perhaps do without tendermint/common :o - but I should tread lightly for now :) )).

for i := 0; i < 1000000; i++ {
t.Set(i2b(int(cmn.RandInt32())), nil)
t.Set(i2b(int(cmn.RandInt32())), value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go test -bench=. was panicking on develop without this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linking #76

@silasdavis
Copy link
Contributor Author

@ValarDragon negative gives should be fine - I've added a test.

@silasdavis silasdavis force-pushed the int64-versions branch 3 times, most recently from 28d5706 to d125383 Compare August 30, 2018 15:53
Copy link
Contributor

@ValarDragon ValarDragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks great to me! Using this KeyFormat makes the code much cleaner / more readable.

However I'm unfamiliar with alot of the internals of iavl, so I'll leave it to @jlandrews, @liamsi to approve.

@silasdavis silasdavis force-pushed the int64-versions branch 3 times, most recently from a2a8ab4 to d265b7d Compare August 31, 2018 08:05
@silasdavis silasdavis changed the title Use 8 bytes for keys with int64 components Use 8 bytes to store int64 components of database keys Aug 31, 2018
@ValarDragon
Copy link
Contributor

Just want to note, this performance increase is actually super helpful for our state machine as well. Some benchmarks I was doing had the sprintf calls from GetNode and SaveAs as taking up 1.5% of the entire state machines time. (Includes goleveldb compaction)

Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome pull request. Thanks a lot @silasdavis 👍

Left a few nits/questions in the review. Will merge tonight.

key_format.go Outdated
}
}

// Format the byte segments into the key format - will panic if the segment lengths to do match the layout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will panic if the segment lengths to not match the layout.

return string([]byte{kf.prefix})
}

func scan(a interface{}, value []byte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to scanValue. Probably just a matter of preferences.

key_format.go Outdated
return key[:n]
}

// Format the args passed into the key format - will panic if the arguments passed to not match the length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will panic if the arguments passed do not match the length

key_format.go Outdated
binary.BigEndian.PutUint64(bs, uint64(v))
return bs
case int:
return format(int64(v))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't all these non-byte cases be simplified to:

case uint64, uint, int64, int:
                bs := make([]byte, 8)
		binary.BigEndian.PutUint64(bs, uint64(v))
		return bs

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I read that I thought, oh really, how did I not know you could do that, but sadly you cannot. The compile-time type of v becomes interface{} as soon as you combine types which cannot be type converted.

You must have been confusing this with a language that has type system :)

I can make it a bit more consistent though:

func format(a interface{}) []byte {
	switch v := a.(type) {
	case uint64:
		bs := make([]byte, 8)
		binary.BigEndian.PutUint64(bs, v)
		return bs
	case uint:
		return format(uint64(v))
	case int64:
		return format(uint64(v))
	case int:
		return format(uint64(v))
	case []byte:
		return v
	default:
		panic(fmt.Errorf("KeyFormat format() does not support formatting value of type %T: %v", a, a))
	}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to do recursive calls, but instead have a second "formatuint64" function which all of these sub-cases call. A second interface conversions seems like an unnecessary performance hit, given that this may be called a ton.

key_format.go Outdated
}

// Create a []byte key format based on a single byte prefix and fixed width key segments each of whose length is
// specified by by the corresponding element of layout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, a little example would be great here?

for i := 0; i < 1000000; i++ {
t.Set(i2b(int(cmn.RandInt32())), nil)
t.Set(i2b(int(cmn.RandInt32())), value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linking #76

key_format.go Outdated

// Format the args passed into the key format - will panic if the arguments passed to not match the length
// of the segment to which they correspond. When called with no arguments returns the raw prefix (useful as a start
// element of the entire keys space when sorted lexicographically)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comments should end with a full stop (here and everywhere else). Don't bother, I'll add these.

key_format.go Outdated
case *int64:
*v = int64(binary.BigEndian.Uint64(value))
case *uint64:
*v = binary.BigEndian.Uint64(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Maybe I'm overlooking sth obvious here but why are there only two cases (int64 and uint64) when reading in the value? When formatting the we allow for uint64, uint, int64, int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's convenient to be able to call kf.Key(0, 1, 2) and not have to cast each argument, kf.Key(0) is the example that exists, and in tests.

key_format.go Outdated
func scan(a interface{}, value []byte) {
switch v := a.(type) {
case *int64:
*v = int64(binary.BigEndian.Uint64(value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this still behave as intended when this overflows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's fine the MSB just becomes the sign bit - I've added a test to demonstrate. Provided the user is consistent with using int64 for signed numbers.

@silasdavis
Copy link
Contributor Author

@liamsi I'll make these changes shortly - thanks.

@silasdavis
Copy link
Contributor Author

@liamsi changes done

@liamsi
Copy link
Contributor

liamsi commented Sep 5, 2018

Awesome! Merging and will prep a release this week.

@liamsi liamsi merged commit 9a73b02 into cosmos:develop Sep 5, 2018
@silasdavis silasdavis deleted the int64-versions branch September 5, 2018 15:47
liamsi added a commit that referenced this pull request Sep 8, 2018
See full changelog here:
https://github.com/tendermint/iavl/blob/develop/CHANGELOG.md#0110-september-7-2018

* Update to CircleCI 2.0 (#108)

* Use 8 bytes to store int64 components of database keys  (#107)

* Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning

* Add node key and orphan key benchmark

* Fix various issue from PR: punctuation, add overflow test, and improve
scan function

* Remove architecture dependent getter functions (#96)

* Remove architecture dependent getter functions

* update changelog

* Prep Release 0.11.0 (#111)

* Bump version and update change-log
danil-lashin pushed a commit to danil-lashin/iavl that referenced this pull request Oct 24, 2018
* 'develop' of github.com:danil-lashin/iavl:
  Prep Release 0.11.0 (cosmos#111)
  Remove architecture dependent getter functions (cosmos#96)
  Use 8 bytes to store int64 components of database keys  (cosmos#107)
  Update to CircleCI 2.0 (cosmos#108)
  Release 0.10.0: Update Changelog and bump version (cosmos#99)
  delete empty file (relict from merging master into develop)
  Mutable/Immutable refactor and GetImmutable snapshots (cosmos#92)
  Remove unreachable code
  Remove unused variable
  dep: Change tendermint dep to be ^v0.22.0 (cosmos#91)
  Fix benchmark scripts for current code (cosmos#89)
  release v0.9.2 (cosmos#82)
  Various lints (cosmos#80)
  Jae/rangeprooffix (cosmos#75)
ridenaio pushed a commit to idena-network/iavl that referenced this pull request Jul 1, 2019
* Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning

* Add node key and orphan key benchmark

* Fix various issue from PR: punctuation, add overflow test, and improve
scan function
ridenaio pushed a commit to idena-network/iavl that referenced this pull request Jul 1, 2019
See full changelog here:
https://github.com/tendermint/iavl/blob/develop/CHANGELOG.md#0110-september-7-2018

* Update to CircleCI 2.0 (cosmos#108)

* Use 8 bytes to store int64 components of database keys  (cosmos#107)

* Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning

* Add node key and orphan key benchmark

* Fix various issue from PR: punctuation, add overflow test, and improve
scan function

* Remove architecture dependent getter functions (cosmos#96)

* Remove architecture dependent getter functions

* update changelog

* Prep Release 0.11.0 (cosmos#111)

* Bump version and update change-log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants