-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LTO: 10x regression #47866
Comments
Refer http://www.brendangregg.com/perf.html#CPUstatistics, Gregg says:
But I am confused that why LTO uses so many instructions. |
What results do you get with |
@siddontang , higher IPC is good if we execute the same (or similar) number if instructions. Here with LTO it executes 10x number of instructions but without 10x increase of IPC that leads to 3x increased run-time. @ollie27 , just tested, it solves it. Another correction, -C lto=thin also solves it, I didn't test it properly before. |
The change list for this regression is unfortunately pretty large (between The primary change in that change list though was switching the default back to "fat" LTO, it seems at some point in the past our fat lto regressed? I'm running out of time to help narrow this down but what I've found so far is that we had a surprising regression of this program between |
(Just to be clear, this is a regression in the perf of the compiled code, right?) |
@alexcrichton do you think you will have time to chase this down? Should we try to find someone to assign? |
triage: P-high Need to figure out what's happening here. |
@nikomatsakis unfortunately I'm pretty tight on time, so it's probably best to find another soul to help out with this. |
Er ok had some time to debug this. @eddyb it looks like this regression may be caused by #46623 or at least something is caused by that. The logs I have are:
That is, with #46623 the benchmark without LTO regressed by 10x |
@alexcrichton Maybe it's similar to #47062 (comment)? Is there any change around that nightly? |
Hm ok some more investigation. The numbers here all over the place and there's a whole bunch of changes that were in flight over the past few months. That being said I don't think that this is a regression! So first up "fat" LTO is much slower than "thin" LTO on this benchmark. That doesn't make a lot of sense because it's supposed to be the other way around! I tried to hunt backwards seeing what could have changed around "fat" LTO but I went back pretty far and it was basically always constant. On my machine going all the way back to The next suspicion was that fat LTO was slower than turning off LTO (odd!). Turns out #46382 somehow magically fixed this. More specifically f9b0897 (the merge before #46382 ) resulted in 1.5s runtime w/o LTO and 0.4s with it (both using 16 cgus). The next merge (#46382) had a runtime of 0.08s w/o LTO (yes, that's without) and 0.4s with it (fat LTO that is). So somehow #46382 improved non LTO builds on this benchmark but kept normal LTO builds the same. Now the final change here is the difference between thin and fat LTO. The switch to ThinLTO happened in #46382 as well, which means a So tl;dr;
All in all this is probably bug in LLVM why fat LTO isn't optimizing as well as a non-LTO build, but I'm overall going to remove the regression tag as I don't believe this is a regression any more as stable is a "fat" LTO by default as are all other channels right now. |
Reverting 8bde2ac "rustc: Add |
@dotdash indeed yeah but all that's doing is switching the default from fat LTO to ThinLTO, you can test that out by passing I'm not sure why, but it appears that this benchmark gets worse when compiled with full LTO. It happens to be faster with ThinLTO, which was the default on beta/nightly for a bit but is no longer the case. |
@alexcrichton great investigation! Seems like we can close the issue, and that @luben ought to use thin lto =)
I imagine this can happen, I mean it's all heuristics in the end. |
Ah yes I believe we can close, @luben if you've got any more questions though please just let me know! |
I noticed a significant regression since nightly 2018-01-27 when enabling LTO. Here is a simplified example that has 10x difference.
Without LTO enabled:
with LTO enabled:
Tested variants
Bad: -C lto=fat
Good: -C lto=thin
Good: -C lto=fat -C codegen-units=1
The text was updated successfully, but these errors were encountered: