-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serious binary size regression on ARMv7-M on nightly-2018-02-11 #49260
Comments
@nagisa pointed out to me on IRC that It seems that LLVM 6 is deciding to unroll some for loops "for performance" (though I doubt it makes any difference in the case of the I don't know if LLVM provides any option to prevent unrolling loops so some people may want to stick to LLVM 4 ... |
This can be fixed by optimizing for size instead of performance: [profile.release]
opt-level = "s" Add that to the main Cargo.toml and we're back to 130 Bytes:
|
Optimizing for size "fixes" the problem in this particular case but it's not a general solution to prevent loop unrolling where it makes no sense; that would require fine grained control over loop unrolling like clang's loop unroll pragma. My experience with opt-level={s,z}, at least when LLVM 4 was around, is that they produce bigger binaries than opt-level=3 -- now that opt-level=3 binaries are bloated due to loop unrolling that may no longer be the case. iirc, opt-level={s,z} also reduces the iniling threshold which prevents LLVM from optimizing dead branches when using RTFM's claim mechanism. I think there's nothing that can be done on the rustc side. We'll have to live with this and document all the options to improve binary size / performance, including switching back to LLVM 4. |
stabilize opt-level={s,z} closes #35784 closes #47651 ### Rationale Since the lastest LLVM upgrade rustc / LLVM does more agressive loop unrolling. This results in increased binary size of embedded / no_std programs: a hundreds of bytes increase, or about a 7x increase, in the case of the smallest Cortex-M binary cf. #49260. As we are shooting for embedded Rust on stable it would be great to also provide a way to optimize for size (which is pretty important for embedded applications that target resource constrained devices) on stable. Also this has been baking in nightly for a long time. r? @alexcrichton which team has to sign off this?
STR
nightly-2018-02-11
or newernightly-2018-02-10
Basically as of nightly-2018-02-11 dev profile is better than release profile and that's better than
compiling with LTO ...
Also, in this case, today's LTO produces a 1 KB (756%) bigger binary that using LTO on
nightly-2018-02-10
. On a more real world example I see a 2.4 KB (21%) increase in binary size.LLVM?
One notorious difference between
nightly-2018-02-10
and newer nightlies is thatnightly-2018-02-10
is using LLVM 4 and everything newer than that is using LLVM 6. However, Idon't think LLVM is fully to blame: today's rustc produces much more LLVM IR than it did on
2018-02-10.
IR files for reference.
thinLTO / parallel codegen?
The problem doesn't seem to be caused by thinLTO or parallel codegen either. I tried this with both
nightly-2018-02-11
andnightly-2018-03-20
:It got slightly better but it's not on parity with
nightly-2018-02-10
ARMv6-M
Also none of this seems to affect ARMv6-M.
This is with the
.cargo/config
andCargo.toml
stuff undone.cc @alexcrichton @nagisa any clue about what could be going wrong here?
The text was updated successfully, but these errors were encountered: