Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a few performance notes #1

Closed
josharian opened this issue Aug 1, 2019 · 3 comments
Closed

a few performance notes #1

josharian opened this issue Aug 1, 2019 · 3 comments

Comments

@josharian
Copy link

Hi! I came here via a twitter post about performance. I was curious, so I poked around a bit for fun. A few minor comments, in the hopes that they are useful.

First, BenchmarkRandomInsert can be a bit misleading. If you make it run faster, it gets to a higher b.N, but at higher b.N, each iteration has to do more work. Thus code improvements don't necessarily show up in improved benchmark results. Picking a fixed number of iterations to do each time would be better. Perhaps like:

func BenchmarkRandomInsert(b *testing.B) {
	const iters = 10000
	boxes := randBoxes(iters)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		var tr RTree
		for iter := 0; iter < iters; iter++ {
			tr.Insert(boxes[iter].min[:], boxes[iter].max[:], iter)
		}
	}
}

Second, I can get a ~15% speedup on BenchmarkRandomInsert on my machine by unrolling the loops in chooseLeastEnlargement. Here's the 3d version:

// min is a sloppy float min function.
// It is faster than math.Min, at the cost
// of sloppy handling of the usual
// floating point delights: NaN and Inf.
func min(x, y float64) float64 {
	if x < y {
		return x
	}
	return y
}

// max is like min, only max.
func max(x, y float64) float64 {
	if x < y {
		return y
	}
	return x
}

func (r *box) chooseLeastEnlargement(b *box) int {
	j, jenlargement, jarea := -1, 0.0, 0.0
	n := r.data.(*node)
	for i := 0; i < n.count; i++ {
		area := (n.boxes[i].max[0] - n.boxes[i].min[0]) *
			(n.boxes[i].max[1] - n.boxes[i].min[1]) *
			(n.boxes[i].max[2] - n.boxes[i].min[2])

		enlargedArea := (max(b.max[0], n.boxes[i].max[0]) - min(b.min[0], n.boxes[i].min[0])) *
			(max(b.max[1], n.boxes[i].max[1]) - min(b.min[1], n.boxes[i].min[1])) *
			(max(b.max[2], n.boxes[i].max[2]) - min(b.min[2], n.boxes[i].min[2]))

		enlargement := enlargedArea - area

		if j == -1 || enlargement < jenlargement || (enlargement == jenlargement && area < jarea) {
			j, jenlargement, jarea = i, enlargement, area
		}
	}
	return j
}

IMHO, it also improves readability a bit. Of course, it makes your code generation more complicated.

@tidwall
Copy link
Owner

tidwall commented Aug 1, 2019

For anyone stumbling upon this post, the tweet mentioned above presents some preliminary benchmark results comparing my Rust and Go implementations of this data structure.
https://twitter.com/tidwall/status/1156339153698562048


Hi!

You bring up excellent points. Loop unrolling is undoubtedly a good improvement.

Another great side-effect from unrolling loops is that the smaller functions will then become candidates for inlining by the Go compiler. It is a bit unfortunate that the Go compiler does not inline functions with FOR loops.

Thus functions such as

func (r *box) intersects(b *box) bool {
	for i := 0; i < dims; i++ {
		if b.min[i] > r.max[i] || b.max[i] < r.min[i] {
			return false
		}
	}
	return true
}

could be replaced with

func (r *box) intersects(b *box) bool {
        return !(b.min[0] > r.max[0] || b.max[0] < r.min[0] ||
                b.min[1] > r.max[1] || b.max[1] < r.min[1] 
                b.min[2] > r.max[2] || b.max[2] < r.min[2])
}

even this (cringe) is better for the compiler

func (r *box) intersects(b *box) bool {
	i := 0
loop:
	if i < dims {
		if b.min[i] > r.max[i] || b.max[i] < r.min[i] {
			return false
		}
		i++
		goto loop
	}
	return true
}

There are a few other improvements, such as moving to proactive insertions which would allow for a non-recursive insert function. But these are implementation details and not really related to the language or compiler. I hope to include these in the Go version at some point.

I'm confident that once these improvements are in-place the Go version will perform as well as the Rust version.


As for the benchmarking. Good tip on b.N. While the BenchmarkRandomInsert is only intended as a sanity check for perf regressions, it's probably worthwhile to make the change to recommend.

Here's the benchmark code that I used in the tweet:

package main

import (
	"math/rand"
	"os"
	"time"

	"github.com/tidwall/lotsa"
	"github.com/tidwall/rbang-go"
)

func main() {
	N := 1000000
	ENVELOPE := []float64{-112.0, 33.0, -111.0, 34.0}
	lotsa.Output = os.Stdout
	rand.Seed(time.Now().UnixNano())

	points := make([][2]float64, N)
	for i := 0; i < N; i++ {
		points[i][0] = rand.Float64()*360 - 180
		points[i][1] = rand.Float64()*180 - 90
	}

	tr := rbang.New(2)
	print("rbang-insert: ")
	lotsa.Ops(N, 1, func(i, _ int) {
		tr.Insert(points[i][:], points[i][:], i)
	})

	print("rbang-search: ")
	lotsa.Ops(N, 1, func(i, _ int) {
		var count int
		tr.Search(ENVELOPE[:2], ENVELOPE[2:],
			func(_, _ []float64, _ interface{}) bool {
				count++
				return true
			},
		)
	})

	print("rbang-delete: ")
	lotsa.Ops(N, 1, func(i, _ int) {
		tr.Delete(points[i][:], points[i][:], i)
	})
}

Thanks a ton for your input!

@josharian
Copy link
Author

It is a bit unfortunate that the Go compiler does not inline functions with FOR loops.

There are many unfortunate things about the Go compiler's inliner: golang/go#17566. For a discussion specifically about inlining for loops: golang/go#14768. Feel free to ping the latter issue in the hopes of inspiring David to finish it up. :)

I'm confident that once these improvements are in-place the Go version will perform as well as the Rust version.

If it doesn't, and you've run out of optimization ideas, I'm happy to take another look. Optimizing Go code is one of my hobbies. :) (Competing with Rust is not. I'm in full agreement with this thread: https://twitter.com/rustlang/status/1156926772212092928.)

Oh, and thanks for providing the benchmark code you used. I looked around a bit in the repo for it and then gave up and used the existing benchmark there. If/when you circle back around to the Go implementation, it'd be lovely to turn it into Go benchmarks, since you get a bunch of other stuff for free with that (profiling, standardized output to feed into other tools, etc.). And when a rando like me wanders into your code base looking for perf improvements it's where they'll start. :P

Have fun playing with Rust. :)

@tidwall
Copy link
Owner

tidwall commented Sep 12, 2019

I just pushed the loop unrolling changes and queries are seeing a happy little boost. 🚀

Performance is now closer to Rust. And looking back at my Rust code, I was actually unrolling the loops. 🤦‍♂️

Thanks for the second set of eyes!

@tidwall tidwall closed this as completed Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants