-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a few performance notes #1
Comments
For anyone stumbling upon this post, the tweet mentioned above presents some preliminary benchmark results comparing my Rust and Go implementations of this data structure. Hi! You bring up excellent points. Loop unrolling is undoubtedly a good improvement. Another great side-effect from unrolling loops is that the smaller functions will then become candidates for inlining by the Go compiler. It is a bit unfortunate that the Go compiler does not inline functions with FOR loops. Thus functions such as func (r *box) intersects(b *box) bool {
for i := 0; i < dims; i++ {
if b.min[i] > r.max[i] || b.max[i] < r.min[i] {
return false
}
}
return true
} could be replaced with func (r *box) intersects(b *box) bool {
return !(b.min[0] > r.max[0] || b.max[0] < r.min[0] ||
b.min[1] > r.max[1] || b.max[1] < r.min[1]
b.min[2] > r.max[2] || b.max[2] < r.min[2])
} even this (cringe) is better for the compiler func (r *box) intersects(b *box) bool {
i := 0
loop:
if i < dims {
if b.min[i] > r.max[i] || b.max[i] < r.min[i] {
return false
}
i++
goto loop
}
return true
} There are a few other improvements, such as moving to proactive insertions which would allow for a non-recursive insert function. But these are implementation details and not really related to the language or compiler. I hope to include these in the Go version at some point. I'm confident that once these improvements are in-place the Go version will perform as well as the Rust version. As for the benchmarking. Good tip on b.N. While the BenchmarkRandomInsert is only intended as a sanity check for perf regressions, it's probably worthwhile to make the change to recommend. Here's the benchmark code that I used in the tweet: package main
import (
"math/rand"
"os"
"time"
"github.com/tidwall/lotsa"
"github.com/tidwall/rbang-go"
)
func main() {
N := 1000000
ENVELOPE := []float64{-112.0, 33.0, -111.0, 34.0}
lotsa.Output = os.Stdout
rand.Seed(time.Now().UnixNano())
points := make([][2]float64, N)
for i := 0; i < N; i++ {
points[i][0] = rand.Float64()*360 - 180
points[i][1] = rand.Float64()*180 - 90
}
tr := rbang.New(2)
print("rbang-insert: ")
lotsa.Ops(N, 1, func(i, _ int) {
tr.Insert(points[i][:], points[i][:], i)
})
print("rbang-search: ")
lotsa.Ops(N, 1, func(i, _ int) {
var count int
tr.Search(ENVELOPE[:2], ENVELOPE[2:],
func(_, _ []float64, _ interface{}) bool {
count++
return true
},
)
})
print("rbang-delete: ")
lotsa.Ops(N, 1, func(i, _ int) {
tr.Delete(points[i][:], points[i][:], i)
})
} Thanks a ton for your input! |
There are many unfortunate things about the Go compiler's inliner: golang/go#17566. For a discussion specifically about inlining for loops: golang/go#14768. Feel free to ping the latter issue in the hopes of inspiring David to finish it up. :)
If it doesn't, and you've run out of optimization ideas, I'm happy to take another look. Optimizing Go code is one of my hobbies. :) (Competing with Rust is not. I'm in full agreement with this thread: https://twitter.com/rustlang/status/1156926772212092928.) Oh, and thanks for providing the benchmark code you used. I looked around a bit in the repo for it and then gave up and used the existing benchmark there. If/when you circle back around to the Go implementation, it'd be lovely to turn it into Go benchmarks, since you get a bunch of other stuff for free with that (profiling, standardized output to feed into other tools, etc.). And when a rando like me wanders into your code base looking for perf improvements it's where they'll start. :P Have fun playing with Rust. :) |
I just pushed the loop unrolling changes and queries are seeing a happy little boost. 🚀 Performance is now closer to Rust. And looking back at my Rust code, I was actually unrolling the loops. 🤦♂️ Thanks for the second set of eyes! |
Hi! I came here via a twitter post about performance. I was curious, so I poked around a bit for fun. A few minor comments, in the hopes that they are useful.
First, BenchmarkRandomInsert can be a bit misleading. If you make it run faster, it gets to a higher b.N, but at higher b.N, each iteration has to do more work. Thus code improvements don't necessarily show up in improved benchmark results. Picking a fixed number of iterations to do each time would be better. Perhaps like:
Second, I can get a ~15% speedup on BenchmarkRandomInsert on my machine by unrolling the loops in
chooseLeastEnlargement
. Here's the 3d version:IMHO, it also improves readability a bit. Of course, it makes your code generation more complicated.
The text was updated successfully, but these errors were encountered: