-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display of integers without raw pointers #135265
base: master
Are you sure you want to change the base?
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Noratrieb (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
r? tgross35 since I'm doing some work in this area anyway You should |
No problem if this one gets rejected in favour of anything else @tgross35. I was just pleased to see the compiler doing this well. |
Actually ignore this, I was thinking this was in
No conflict :) Replacing unsafe code is always nice if there are no drawbacks. |
It looks like there may be some small regressions here, the new code doesn't elide a couple calls to Soon it will be possible to |
What a gem that Compiler Explorer is @tgross35. I think the assembly looks good now. B.t.w., I tried bit shifting entries from the lookup table into a 64-bit register and write per 8 digits at once. The compiler "optimizes" such intermediate register away, and it goes for 16-bit copies instead. Maybe those small writes are actually faster. I don't know. 😂 |
I found a way without compiler hinting which is even faster than the unsafe code as is. It requires buffers of an even byte-size. Needs a little more work for all integer sizes. |
I am by no means a perf expert but I still can't imagine chasing down optimizations without it :)
Are you planning to update the PR with something like this? At a surface level the patch seems pretty reasonable but I didn't look too deep yet, I'll hold off on reviewing if you have something faster coming. |
Yes, will give it a try @tgross35. Same surprise here as you're having. 🙂 |
@rustbot author (just comment |
The speed gain is because (1) the loop decision no longer depends on the digit calculation, (2) it produces fewer instructions, and (3) it no longer needs a check for special case zero, which has a "significant" leading zero. Unfortunately the results are less optimal for the two benchmark cases in the core library: either zero or maximum digits in a loop. Apparently the branch prediction benefit does not outweight the per-4-digit tweak in such scenario. So now we have the explicit assertions as hint assertions too. Can you have a look @tgross35? @rustbot review |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
c8893fa
to
f0a5e85
Compare
This comment has been minimized.
This comment has been minimized.
f0a5e85
to
651707f
Compare
The job Click to see the possible cause of the failure (guessed by this bot)
|
Don't have Windows to figure out what's failing there. Leaving as is. 😐 |
That doesn't look like a Windows error. It looks like it's detecting a bug in one of your uses of
|
☔ The latest upstream changes (presumably #135937) made this pull request unmergeable. Please resolve the merge conflicts. |
Understood. I did check all of the assertions and there's no way. Can't locally reproduce. Tests pass fine. |
Maybe change |
// Four digits need a 16-bit $unsigned or wider. | ||
#[allow(overflowing_literals)] | ||
#[allow(unused_comparisons)] | ||
while remain > 999 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. You've allowed overflowing_literals
so 999_u8 == 231
.
The benchmarks as is measure formatting speed of literals. The first commit
black_box
-es input to simulate runtime speed instead.The second commit replaces
unsafe
pointer optimizations with plain array indices. The performance is equivalent on Apple M1. Needs peer review on Intel.Happy to do the 128-bit version too if such change is welcome.