Skip to content
This repository has been archived by the owner on Feb 27, 2023. It is now read-only.

Improve treeHasher performance #42

Merged
merged 1 commit into from
Jul 14, 2021

Conversation

cuonglm
Copy link
Contributor

@cuonglm cuonglm commented Jul 13, 2021

By doing two things:

  • Pre-allocated enough size for slice, instead of initializing a small
    one then continuously appending to it.
  • Make placeholder slice re-usable, only extending the size when the
    current hasher size is greater than current one.

That helps improves the sppeed, and less allocations for Update/Delete
operations:

name                       old time/op    new time/op    delta
SparseMerkleTree_Update-8    11.9µs ± 3%    11.2µs ± 3%   -5.89%  (p=0.008 n=5+5)
SparseMerkleTree_Delete-8    9.66µs ± 2%    5.40µs ± 2%  -44.12%  (p=0.008 n=5+5)

name                       old alloc/op   new alloc/op   delta
SparseMerkleTree_Update-8    17.7kB ± 0%    16.1kB ± 1%   -9.42%  (p=0.016 n=4+5)
SparseMerkleTree_Delete-8    16.3kB ± 0%    13.9kB ± 0%  -14.82%  (p=0.008 n=5+5)

name                       old allocs/op  new allocs/op  delta
SparseMerkleTree_Update-8       117 ± 0%        63 ± 0%  -46.15%  (p=0.029 n=4+4)
SparseMerkleTree_Delete-8      94.4 ± 1%      28.6 ± 2%  -69.70%  (p=0.008 n=5+5)

@cuonglm
Copy link
Contributor Author

cuonglm commented Jul 13, 2021

cc @odeke-em

@codecov-commenter
Copy link

codecov-commenter commented Jul 13, 2021

Codecov Report

Merging #42 (2066960) into master (072abf0) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #42   +/-   ##
=======================================
  Coverage   86.05%   86.05%           
=======================================
  Files           6        6           
  Lines         466      466           
=======================================
  Hits          401      401           
  Misses         37       37           
  Partials       28       28           
Impacted Files Coverage Δ
treehasher.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 072abf0...2066960. Read the comment docs.

Copy link
Contributor

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @cuonglm! Let's also see if we can shave off by reusing the hashers

@cuonglm
Copy link
Contributor Author

cuonglm commented Jul 13, 2021

LGTM, thank you @cuonglm! Let's also see if we can shave off by reusing the hashers

Yeah, I think it can, but want to keep the PR as small as possible.

@odeke-em
Copy link
Contributor

LGTM, thank you @cuonglm! Let's also see if we can shave off by reusing the hashers

Yeah, I think it can, but want to keep the PR as small as possible.

Absolutely, I meant to write/add "later on"

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides a minor nit, looks good to me 👍 thanks for this optimisation!

treehasher.go Outdated Show resolved Hide resolved
treehasher.go Outdated Show resolved Hide resolved
@cuonglm cuonglm requested a review from adlerjohn July 14, 2021 03:12
@cuonglm cuonglm force-pushed the cuonglm/improve-treehasher branch from da97295 to 6098408 Compare July 14, 2021 03:13
By doing two things:

 - Pre-allocated enough size for slice, instead of initializing a small
   one then continuously appending to it.
 - Initializing placeholder slice once instead of creating new one
   everytime treeHasher.placeholder is called.

That helps improves the sppeed, and less allocations for Update/Delete
operations:

name                       old time/op    new time/op    delta
SparseMerkleTree_Update-8    11.9µs ± 3%    11.2µs ± 3%   -5.89%  (p=0.008 n=5+5)
SparseMerkleTree_Delete-8    9.66µs ± 2%    5.40µs ± 2%  -44.12%  (p=0.008 n=5+5)

name                       old alloc/op   new alloc/op   delta
SparseMerkleTree_Update-8    17.7kB ± 0%    16.1kB ± 1%   -9.42%  (p=0.016 n=4+5)
SparseMerkleTree_Delete-8    16.3kB ± 0%    13.9kB ± 0%  -14.82%  (p=0.008 n=5+5)

name                       old allocs/op  new allocs/op  delta
SparseMerkleTree_Update-8       117 ± 0%        63 ± 0%  -46.15%  (p=0.029 n=4+4)
SparseMerkleTree_Delete-8      94.4 ± 1%      28.6 ± 2%  -69.70%  (p=0.008 n=5+5)
@cuonglm cuonglm force-pushed the cuonglm/improve-treehasher branch from 6098408 to 2066960 Compare July 14, 2021 06:49
Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!

@liamsi liamsi merged commit a431f73 into celestiaorg:master Jul 14, 2021
@odeke-em odeke-em deleted the cuonglm/improve-treehasher branch July 14, 2021 08:42
copy(value, leafPrefix)

value := make([]byte, 0, len(leafPrefix)+len(path)+len(leafData))
value = append(value, leafPrefix...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy should be faster than append

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robert-zaremba It should be the same if we pre-allocate the slice, which we did here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I remember copy is faster. Maybe something changed in the recent Go releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember it will literally call the same code under the hood if the slice is pre-allocated. Append is just slightly more idiomatic. Especially, if it is chained / composed of multiple appends. I'm happy to verify this with a benchmark.

Copy link
Contributor Author

@cuonglm cuonglm Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robert-zaremba Here's a quick prove:

package main

import "testing"

var data = [512]int{}

func BenchmarkAppend(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s := make([]int, 0, 512)
		s = append(s, data[:]...)
	}
}

func BenchmarkCopy(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s := make([]int, 512)
		copy(s, data[:])
	}
}

The result:

goos: darwin
goarch: arm64
BenchmarkAppend-8   	 8270780	       136.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkCopy-8     	 8787560	       136.6 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	command-line-arguments	2.793s

The result can be varied between each run, but just several ns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good. Thanks @cuonglm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants