-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BinaryHeap performance #78857
Conversation
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
I don't know if this makes a difference performance-wise (or even readability-wise), but you could avoid the comparisons with |
@tavianator I didn't think of that, but I wouldn't expect it to make much of a difference (the compiler stores
I also tested a reduced version on godbolt, and looks like LLVM is doing a worse job at optimizing |
Huh, thanks for checking. That codegen is kinda odd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I just left one comment to include your rationale for the new unsafe
block inline. r=me after that.
I've added the safety comment, I hope it's clear enough. I'm tempted to document other r=KodrAus |
@bors r=KodrAus (You need to mention the bot and be either in the reviewer set or be delegated to, see cheatsheet at https://bors.rust-lang.org/) |
📌 Commit 387568c has been approved by |
⌛ Testing commit 387568c with merge d75378aab28aecf8b1521a40409823dd9c0f928b... |
💔 Test failed - checks-actions |
@bors retry |
Improve BinaryHeap performance By changing the condition in the loops from `child < end` to `child < end - 1` we're guaranteed that `right = child + 1 < end` and since finding the index of the biggest sibling can be done with an arithmetic operation we can remove a branch from the loop body. The case where there's no right child, i.e. `child == end - 1` is instead handled outside the loop, after it ends; note that if the loops ends early we can use `return` instead of `break` since the check `child == end - 1` will surely fail. I've also removed a call to `<[T]>::swap` that was hiding a bound check that [wasn't being optimized by LLVM](https://godbolt.org/z/zrhdGM). A quick benchmarks on my pc shows that the gains are pretty significant: |name |before ns/iter |after ns/iter |diff ns/iter |diff % |speedup | |---------------------|----------------|---------------|--------------|----------|--------| |find_smallest_1000 | 352,565 | 260,098 | -92,467 | -26.23% | x 1.36 | |from_vec | 676,795 | 473,934 | -202,861 | -29.97% | x 1.43 | |into_sorted_vec | 469,511 | 304,275 | -165,236 | -35.19% | x 1.54 | |pop | 483,198 | 373,778 | -109,420 | -22.64% | x 1.29 | The other 2 benchmarks for `BinaryHeap` (`peek_mut_deref_mut` and `push`) weren't impacted and as such didn't show any significant change.
Rollup of 7 pull requests Successful merges: - rust-lang#76730 (Fix rustdoc rendering of by-value mutable arguments in async fn) - rust-lang#78836 (Implement destructuring assignment for structs and slices) - rust-lang#78857 (Improve BinaryHeap performance) - rust-lang#78950 (Add asm register information for SPIR-V) - rust-lang#78970 (update rustfmt to v1.4.25) - rust-lang#78972 (Update cargo) - rust-lang#78987 (extend min_const_generics param ty tests) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
* Remove useless branches from sift_down_range loop * Remove branches from sift_down_to_bottom loop * Remove useless bound checks from into_sorted_vec
518: Remove unpredicted branch from kmerge::sift_down r=jswrenn a=SkiFire13 This is pretty much a port from rust-lang/rust#78857 Compared with the previous implementation, this adds more branches for bound checks which aren't present on the stdlib version, however they should be predicted almost always. The speedup should come from the removal of an unpredictable branch from the loop body, in favor of boolean arithmetic. The benchmarks seem to agree: ``` before: test kmerge default ... bench: 6812 ns/iter (+/- 18) test kmerge tenway ... bench: 223673 ns/iter (+/- 769) after: test kmerge default ... bench: 6212 ns/iter (+/- 43) test kmerge tenway ... bench: 190700 ns/iter (+/- 419) ``` Co-authored-by: Giacomo Stevanato <[email protected]>
By changing the condition in the loops from
child < end
tochild < end - 1
we're guaranteed thatright = child + 1 < end
and since finding the index of the biggest sibling can be done with an arithmetic operation we can remove a branch from the loop body. The case where there's no right child, i.e.child == end - 1
is instead handled outside the loop, after it ends; note that if the loops ends early we can usereturn
instead ofbreak
since the checkchild == end - 1
will surely fail.I've also removed a call to
<[T]>::swap
that was hiding a bound check that wasn't being optimized by LLVM.A quick benchmarks on my pc shows that the gains are pretty significant:
The other 2 benchmarks for
BinaryHeap
(peek_mut_deref_mut
andpush
) weren't impacted and as such didn't show any significant change.