-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed optimization comparing TypeId arrays #84253
Comments
A simpler starting point that already doesn't optimize: https://godbolt.org/z/rvMah5b5o |
@nikic Do you know if rustc provides some custom LLVM passes to pipeline? After skimming of source, it looks like it just uses default optimization levels from LLVM. |
@AngelicosPhosphoros Yes, rust uses only the default optimization pipeline. In this case, I believe the problem lies with unrolling. After some cleanup, we have this IR going into loop-unroll-full: https://gist.github.com/nikic/9516dede0acedd8d320e9d067559e667 If I understand correctly, it is not unrolled fully because we know small upper bound on the tripcount (3), but the target (X86) does not enable UpperBound unrolling. It does work with
Which makes me think that this option should probably enabled on X86, which is certainly not a system with "constrained branch predictor resources". Though there is another important factor here: What the constant upper bound doesn't capture, is the fact that in this case, there's two loop exits. One of the exits will be replicated by unrolling. However, the other one is fully determined by unrolling. These exits will be eliminated, together with the induction variable. This means that the loop unrolling heuristic is inconsistent: If the loop had an interior (non-exiting) branch, then it would get fully unrolled, even though that replicates the exact same number of branches. It's just a difference between whether those branches are exiting or not. |
Looking a bit further, there is this code: https://github.com/llvm/llvm-project/blob/ebc6608fb79057eaed27435d62d5dea0979bd9d3/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp#L1099-L1108 For exact trip counts, LoopUnroll will actually not use the loop trip count, but rather the trip count of a single exit or the latch exit. This is really the case we want to fall into. Unfortunately, in our case the latch exit is the comparison of the loads (which has an unknown exit count) rather than the comparison of the induction variant (which is known). There's two ways that could be addressed: The LoopUnroll logic could be more general about which exit count it uses. Or LoopRotate could be adjusted to perform a rotation in this case despite already having an exiting latch (there is already a heuristic for that, but it does not account for this case). |
This issue should be addressed by https://reviews.llvm.org/D102982 (plus about a dozen preparatory patches to make loop unrolling robust against multiple exits). |
This is fixed by #87570. New IR folds to true/false:
|
Edit: Look nikic example below.
I tried this code:
I expected to see this happen:
This code should compile to
return true
orreturn false
.Instead, this happened:
asm contains useless operations
Meta
rustc --version --verbose
:Proposed fix
If I add additional SROA flag, useless code is eliminated. I think, we need to add another SROA to pipeline somewhere before last instcombine and dead code elimination.
Here godbolt link.
The text was updated successfully, but these errors were encountered: