-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdenv: provide a deterministically built gcc #112928
Conversation
How do other distros handle this i.e. debian? |
From what I can read, they run the profiledbootstrap, but the automated reproducibility tests times out. buildlogs: https://buildd.debian.org/status/fetch.php?pkg=gcc-10&arch=amd64&ver=10.2.1-6&stamp=1610315062&raw=0 |
Looks like Arch doesn't use profiledbootstrap it in favor of reproducibility: https://bugs.archlinux.org/task/56856 Nix provides an incredibly robust means to have the default be reproducible and allow opt-in for optimization. This means we have more options in this arena. |
06ebc8a
to
40505ff
Compare
Running the following benchmark: Trying to compare build performance before/after this change, I get the following results:
So there is definitely a performance impact on this change. 7-12% slowdown |
Result of 26 packages built:
|
Result of
|
@tomberek Yes I hit the same issues as well, I'm not too sure what went wrong with the gfortran evaluation. |
Running @baloo's benchmark I get around 2-5% slowdown. An interesting question is how much slowdown is acceptable. If reproducible builds meant a 100% slowdown, we'd have a much harder time justifying it. Having robust reproducible builds allows a kind of parallelism in the package building ecosystem that would provide an order of magnitude speed up in how fast channels update, allow for greater coordination between mutually distrusted builders, easier regression testing, etc. So where would we draw the line? 10%? Has any of the reproducible-builds community set some rules of thumb for the effort? |
40505ff
to
1c6570c
Compare
1c6570c
to
79ea70d
Compare
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
We have something like that in nixpkgs actually: The foot derivation supports PGO, but the build is made deterministic by generating the profiling data in the same way each time (by using a fixed seed for the random inputs generated for the profiling information collection). From my testing this makes the build deterministic as PGO doesn't seem to take timing information into account. |
@sternenseemann just to see if I got things right. The way gcc does PGO is by compiling gcc itself and gathering profiling information from that build. I don't even know why there is entropy here. |
@baloo You got that right, yeah. That is then in turn used to compile the actual output binary. I also need to correct myself, I checked again and the build is actually reproducible as well: I can reproduce the exact same foot as our hydra produced by using The upshot from this is: if we could track down the entropy in gcc's build we could have a reproducible gcc build which has PGO (unless I'm missing something). This is however probably quite the task, I wonder if upstream has some interest in this as well? |
I'll try a build with a patch like:
|
Compiling twice on the same machine gets me the same result, but I still get variation between builds on two different machines. So it's not the compilation ordering.
|
Making sure I'm understanding you correctly: you were testing with So your experiment was intended to see if adding (I would like to see this PR merged, continuing work on adding PGO back in a deterministic way in parallel) |
Not sure what is best way to do comparison. |
I'm arriving at |
Yes, exactly
100% agree. I still think we can do better, it just needs a ton of work. I need to keep digging about PGO. I still don't get how entropy is injected. As far as I can tell, there is noise/entropy generated in the "stagetrain" but I can't what it's coming from (yet). |
Does the profiling actually make a reasonable difference today? Aren't the benchmarks it is running actually running, for most users, on a very powerful datacenter grade machine which is also (generally) heavily loaded? Are these benchmarks useful for end users, who are most likely running GCC in a very different environment? |
PGO — even though the profiling in principle has limited merit for the actual machines that we are building gcc for, impacts performance positively in a significant way. @baloo tested this further up in the thread #112928 (comment). As far as I understand it, this is because profiling mostly reveals to the compiler, what codepaths are taken more often than others in actual invocations and thus allows further optimization. |
Looking through the last few months of discussion, it seems the rough consensus is that the performance hit looks acceptable (esp. with the availability of Further research on how to "eat our cake and have it, too" will still be valuable, of course. |
I don't think |
As for PGO generally, I believe the main benefit is in better estimates of which code-paths are hot and which are cold. That should be mostly independent of the machine and more sensitive of the "training inputs". The rule of thumb is to optimize hot paths for speed and cold paths for code size (which can also improve speed due to saving CPU instruction caches). EDIT: I forgot to add that PGO and LTO have a good synergy, i.e. using both at once can improve more than sum of using either. |
That's my expectation as well - if we can find a way to remove the current nondeterminism (adding |
It's not just a cost for our build farm, it's a cost for every user of gcc. Slowing down every C/C++ build by 7-12% is a substantial price to pay... |
For a reproducible ISO, there is a simpler solution: don't include GCC in the ISO. However from the perspective of content-addressable Nix, a deterministic GCC is certainly a lot better. |
Motivation for this change
This is a proposal to fix one of the last issue on the road to reproducibility of nixos.
There is some background information here: #108475
There is also some discussion on: #445
gcc
, when built, will run multiple stages. It will use performance data and profiling of one of those compilation stages to inject optimizations on a later stage. The purpose of this is to optimize performance.This renders the build nondeterministic and impure since it inject local behavior of the builder. I believe this is contrary to the principles of nix, and @edolstra thesis.
Furthermore, I'm not sure how optimizations made for an hydra builder would affect performance on any other machine. To that end, I chose to make the
gcc
used bystdenv
deterministic but keep an profiled builtgcc
in the default packages (build derivations deterministically albeit a bit slower, but you can run an optimizedgcc
in yournix-shell
(if you're doing development)).Note: This could probably use some benchmarks here.
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)