a few performance notes #1

josharian · 2019-08-01T06:35:19Z

Hi! I came here via a twitter post about performance. I was curious, so I poked around a bit for fun. A few minor comments, in the hopes that they are useful.

First, BenchmarkRandomInsert can be a bit misleading. If you make it run faster, it gets to a higher b.N, but at higher b.N, each iteration has to do more work. Thus code improvements don't necessarily show up in improved benchmark results. Picking a fixed number of iterations to do each time would be better. Perhaps like:

func BenchmarkRandomInsert(b *testing.B) {
	const iters = 10000
	boxes := randBoxes(iters)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		var tr RTree
		for iter := 0; iter < iters; iter++ {
			tr.Insert(boxes[iter].min[:], boxes[iter].max[:], iter)
		}
	}
}

Second, I can get a ~15% speedup on BenchmarkRandomInsert on my machine by unrolling the loops in chooseLeastEnlargement. Here's the 3d version:

// min is a sloppy float min function.
// It is faster than math.Min, at the cost
// of sloppy handling of the usual
// floating point delights: NaN and Inf.
func min(x, y float64) float64 {
	if x < y {
		return x
	}
	return y
}

// max is like min, only max.
func max(x, y float64) float64 {
	if x < y {
		return y
	}
	return x
}

func (r *box) chooseLeastEnlargement(b *box) int {
	j, jenlargement, jarea := -1, 0.0, 0.0
	n := r.data.(*node)
	for i := 0; i < n.count; i++ {
		area := (n.boxes[i].max[0] - n.boxes[i].min[0]) *
			(n.boxes[i].max[1] - n.boxes[i].min[1]) *
			(n.boxes[i].max[2] - n.boxes[i].min[2])

		enlargedArea := (max(b.max[0], n.boxes[i].max[0]) - min(b.min[0], n.boxes[i].min[0])) *
			(max(b.max[1], n.boxes[i].max[1]) - min(b.min[1], n.boxes[i].min[1])) *
			(max(b.max[2], n.boxes[i].max[2]) - min(b.min[2], n.boxes[i].min[2]))

		enlargement := enlargedArea - area

		if j == -1 || enlargement < jenlargement || (enlargement == jenlargement && area < jarea) {
			j, jenlargement, jarea = i, enlargement, area
		}
	}
	return j
}

IMHO, it also improves readability a bit. Of course, it makes your code generation more complicated.

The text was updated successfully, but these errors were encountered:

tidwall · 2019-08-01T12:36:06Z

For anyone stumbling upon this post, the tweet mentioned above presents some preliminary benchmark results comparing my Rust and Go implementations of this data structure.
https://twitter.com/tidwall/status/1156339153698562048

Hi!

You bring up excellent points. Loop unrolling is undoubtedly a good improvement.

Another great side-effect from unrolling loops is that the smaller functions will then become candidates for inlining by the Go compiler. It is a bit unfortunate that the Go compiler does not inline functions with FOR loops.

Thus functions such as

func (r *box) intersects(b *box) bool {
	for i := 0; i < dims; i++ {
		if b.min[i] > r.max[i] || b.max[i] < r.min[i] {
			return false
		}
	}
	return true
}

could be replaced with

func (r *box) intersects(b *box) bool {
        return !(b.min[0] > r.max[0] || b.max[0] < r.min[0] ||
                b.min[1] > r.max[1] || b.max[1] < r.min[1] 
                b.min[2] > r.max[2] || b.max[2] < r.min[2])
}

even this (cringe) is better for the compiler

func (r *box) intersects(b *box) bool {
	i := 0
loop:
	if i < dims {
		if b.min[i] > r.max[i] || b.max[i] < r.min[i] {
			return false
		}
		i++
		goto loop
	}
	return true
}

There are a few other improvements, such as moving to proactive insertions which would allow for a non-recursive insert function. But these are implementation details and not really related to the language or compiler. I hope to include these in the Go version at some point.

I'm confident that once these improvements are in-place the Go version will perform as well as the Rust version.

As for the benchmarking. Good tip on b.N. While the BenchmarkRandomInsert is only intended as a sanity check for perf regressions, it's probably worthwhile to make the change to recommend.

Here's the benchmark code that I used in the tweet:

package main

import (
	"math/rand"
	"os"
	"time"

	"github.com/tidwall/lotsa"
	"github.com/tidwall/rbang-go"
)

func main() {
	N := 1000000
	ENVELOPE := []float64{-112.0, 33.0, -111.0, 34.0}
	lotsa.Output = os.Stdout
	rand.Seed(time.Now().UnixNano())

	points := make([][2]float64, N)
	for i := 0; i < N; i++ {
		points[i][0] = rand.Float64()*360 - 180
		points[i][1] = rand.Float64()*180 - 90
	}

	tr := rbang.New(2)
	print("rbang-insert: ")
	lotsa.Ops(N, 1, func(i, _ int) {
		tr.Insert(points[i][:], points[i][:], i)
	})

	print("rbang-search: ")
	lotsa.Ops(N, 1, func(i, _ int) {
		var count int
		tr.Search(ENVELOPE[:2], ENVELOPE[2:],
			func(_, _ []float64, _ interface{}) bool {
				count++
				return true
			},
		)
	})

	print("rbang-delete: ")
	lotsa.Ops(N, 1, func(i, _ int) {
		tr.Delete(points[i][:], points[i][:], i)
	})
}

Thanks a ton for your input!

josharian · 2019-08-01T18:34:12Z

It is a bit unfortunate that the Go compiler does not inline functions with FOR loops.

There are many unfortunate things about the Go compiler's inliner: golang/go#17566. For a discussion specifically about inlining for loops: golang/go#14768. Feel free to ping the latter issue in the hopes of inspiring David to finish it up. :)

I'm confident that once these improvements are in-place the Go version will perform as well as the Rust version.

If it doesn't, and you've run out of optimization ideas, I'm happy to take another look. Optimizing Go code is one of my hobbies. :) (Competing with Rust is not. I'm in full agreement with this thread: https://twitter.com/rustlang/status/1156926772212092928.)

Oh, and thanks for providing the benchmark code you used. I looked around a bit in the repo for it and then gave up and used the existing benchmark there. If/when you circle back around to the Go implementation, it'd be lovely to turn it into Go benchmarks, since you get a bunch of other stuff for free with that (profiling, standardized output to feed into other tools, etc.). And when a rando like me wanders into your code base looking for perf improvements it's where they'll start. :P

Have fun playing with Rust. :)

tidwall · 2019-09-12T21:11:42Z

I just pushed the loop unrolling changes and queries are seeing a happy little boost. 🚀

Performance is now closer to Rust. And looking back at my Rust code, I was actually unrolling the loops. 🤦‍♂️

Thanks for the second set of eyes!

tidwall closed this as completed Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a few performance notes #1

a few performance notes #1

josharian commented Aug 1, 2019

tidwall commented Aug 1, 2019 •

edited

Loading

josharian commented Aug 1, 2019

tidwall commented Sep 12, 2019

a few performance notes #1

a few performance notes #1

Comments

josharian commented Aug 1, 2019

tidwall commented Aug 1, 2019 • edited Loading

josharian commented Aug 1, 2019

tidwall commented Sep 12, 2019

tidwall commented Aug 1, 2019 •

edited

Loading