-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The new step_by is too much slow #43064
Comments
Dear bors, before closing down this issue you should show me that the asm after that pull request is good enough. I'll verify it myself tomorrow with the next Nightly. |
Technically this is closed by GitHub not bors 😄 Feel free to reopen if it is still not fixed. |
Reopening so we track verification. |
The asm now is:
Ti me it looks still too much long. And apparently there's a call to a TryFrom in the middle of the loop... In some of my code I see a further 10% performance decrease after the recent performance decrease caused by allocators. |
I pretty much can’t read assembly. Is either the before or after code O(n) for The call to I have more changes to this code in #43127 |
I think it's now O(1), but the asm is a bit of a mess. step_by() is part of the basics of for loops in Rust, a system language. So before pull requests about it got merged, someone that knows assembly should carefully review the code.
So having only usize as step_by argument causes problems. Perhaps it's not a good idea. step_by() is meant to be used everywhere, its inner loop will run trillions of times. Perhaps all changes to step_by of the last few weeks should be reverted, and merged again only if and when they are acceptable. |
Doing this kind of experiment and finding problems is the point of having a Nightly channel. |
This looks like it's an issue with TryFrom/TryInto rather than step_by itself. EDIT: The main difference seems to be the overflow checks. |
In addition to the overflow checks, there is a check in the Additionally, there is a call to try_from that isn't optimized out, but #43194 should help with that. The overflow checks are probably the main issue though, so having a specialisation of this trait for ranges would probably be helpful. |
I had a go at adding specializations for the primitive types, however, that triggered #36262, which made the compiler treat range syntax without type annotations (e.g 0...50) as always being i32 rather than inferring the type. I Also tried to specialize using the Step trait, however I didn't manage to it to work as adding default to |
To better show the effects of the new step_by here you can find a test program (that solves two different Euler problems) that uses step_by (run-time 4.40 seconds): And the same code with step_by replaced by while loops (run-time 3.81 seconds): |
@oyvindln Nice experiment (I'd love to see the code). I think the associated type defaults issue is expected, and it seems best to avoid using associated type specialization for now, even in std. |
@bluss Didn't find any code from trying to specialize using the step trait. |
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
Since few days the step_by is fast enough for my usages, so I close this issue down (I still don't like the design of the new step_by that only works with usize steps, but I guess I can't change that). I leave Issue #45222 open because closed intervals are still too much slow. |
As I explained in #42534 the new step_by can't be used:
Compiling with:
rustc -O --emit asm test.rs
It gives:
In a function like second_foo() the old step_by gave something like:
The text was updated successfully, but these errors were encountered: