-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Use multiple codegen units on non-dist bots #44675
Conversation
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
r? @aidanhs |
@bors r+ p=1 |
📌 Commit 1f9b02b has been approved by |
ci: Use multiple codegen units on non-dist bots This commit is yet another attempt to bring down our cycle times by parallelizing some of the long-and-serial parts of the build, for example optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that any perf loss from codegen units is more than made up for with the perf gain from using multiple codegen units. The value of 16 codegen units here is pretty arbitrary, it's basically just a number which hopefully means that the cores are always nice and warm.
@bors r- There's only 2 cores usually though at least on Travis, so this would seem like too many? I'm not too sure how that works out in practice though... |
This isn't definitive data, but it's worth noticing that this build took about 10 minutes longer than every other recent PR. |
After a brief look, it looks like it might be making compilation of rustc itself a touch faster, but building+running the tests a bunch slower. At minimum it'd be good to see the effect on full builds. |
This PR is the likely cause of https://ci.appveyor.com/project/rust-lang/rust/build/1.0.4720, so I'll look into that when I get a chance. @Mark-Simulacrum note that the number of cores and the number of codegen units don't tend to have a correlation on one another. With the support nowadays we'll never run more than 2 jobs in parallel, and trans will even attempt to finish codegen on some translation units before it moves on to even start translating the next unit. I selected 16 here knowing that Travis only has 2 cores to ensure that we always keep the builders hot. Otherwise if we generate 2 codegen units one of them could finish codegen instantly while the other would spend the whole time translating, not actually getting us any benefit. The hope here is that with enough codegen units we can keep the cpus nice and busy and make sure that one's not idle for too much longer while the last cgu is finishing. @mattico thanks for looking I'll probably turn this down to like 4 or 6 to have a minimal impact for now on compile times. Most of the time this just means there's missing |
It may be worth compiling the core crates and tools (libstd, libtest, librustc) with codegen units set to ~16 and then compile tests, which generally compile very fast, without codegen units, since I imagine the added parallelism there may be hurting us. Feel free to re-r+ whenever (from my comment, at least). I didn't know that we never run more than two codegen units at the same time -- that seems like it would hurt people with more powerful computers (3 or more CPU cores). |
Another thing to note is that rustc recently gained the ability to start LLVM already after the first CGU has been translated. So having more CGUs than cpu-cores potentially does make sense, since the second thread can start working earlier. |
I don't think that's what @alexcrichton is saying. Instead, I believe the idea is that compilation is split into N bits of work, and then those bits of work are executed with as many units of parallelism as you have CPU cores. The problem is that you don't know if one of the bits of work is going to take 10x longer all the others, but if you do and you've just split into 2 units then you have one core idly for ages. By splitting into more you're reducing the probability of hitting this worst case (in theory). |
1f9b02b
to
d670a77
Compare
@bors: r=aidanhs p=0 Hm ok I can't reproduce this locally, so I'm going to try to see the error on AppVeyor again. Also reduced to 8 codegen units. @Mark-Simulacrum ah yes @aidanhs is right, the intention here is to reduce the chance of having one core running lots of work and one empty with work. |
📌 Commit d670a77 has been approved by |
Also I believe the codegen-units setting here only applie to rustc/std, not any tests. |
Could we get more CPU cores on the machines that run large test suites? Running tests scales pretty well with the number of cores. |
@michaelwoerister I wish! Unfortunately though it's not so easy :( |
Try adding |
⌛ Testing commit d670a7775309e9c0233907b426bd76e2c356004b with merge 6e7b5cd731ae1549bf0c52fa71af137046a5ec42... |
@bors: r=aidanhs |
💡 This pull request was already approved, no need to approve it again.
|
📌 Commit d670a77 has been approved by |
d670a77
to
e157893
Compare
@bors: r=aidanhs |
📌 Commit e157893 has been approved by |
⌛ Testing commit e157893859643e6fdecb21a9254f735bb2f4d926 with merge a47067e228c2ce4bcf5197be5f8fada509718c08... |
💔 Test failed - status-appveyor |
This commit is yet another attempt to bring down our cycle times by parallelizing some of the long-and-serial parts of the build, for example optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that any perf loss from codegen units is more than made up for with the perf gain from using multiple codegen units. The value of 8 codegen units here is pretty arbitrary, it's basically just a number which hopefully means that the cores are always nice and warm. Also a previous version of this commit bounced on Windows CI due to libstd being compiled with multiple codegen units, so only the compiler is now compiled with multiple codegen units.
e157893
to
27c26da
Compare
@bors: r=aidanhs |
📌 Commit 27c26da has been approved by |
let cgus = if mode == Mode::Libstd { | ||
self.config.rust_codegen_units | ||
} else { | ||
self.config.rustc_codegen_units.unwrap_or(self.config.rust_codegen_units) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make rustc_codegen_units
do what it says on the tin and just apply to rustc (rather than it being an 'all-but-std' flag)? That seems to be where the majority of time goes anyway when I build the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure yeah. This I expect to be one of those "obscure options that no one ever touches", but I can change it after this PR lands or if it bounces.
⌛ Testing commit 27c26da with merge f7b98020a75f8e6701715473ffd3896bcc35a6fd... |
💔 Test failed - status-travis |
@bors retry The two macs 3-hour timed out. Not sure if related to the current Travis incident. It may also mean using multiple CGUs may slow down on the mac CIs though. |
ci: Use multiple codegen units on non-dist bots This commit is yet another attempt to bring down our cycle times by parallelizing some of the long-and-serial parts of the build, for example optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that any perf loss from codegen units is more than made up for with the perf gain from using multiple codegen units. The value of 16 codegen units here is pretty arbitrary, it's basically just a number which hopefully means that the cores are always nice and warm.
@bors r- retry This caused rebuilds during testing I think that I want to investigate |
Ok looking at the logs this is not clearly a win, the bootstrap was faster and the tests took way slower. Now I don't really expect predictable performance out of the OSX builders, but at least for now it seems like this is not the win we're hoping for. |
It was worth a try. |
This commit is yet another attempt to bring down our cycle times by
parallelizing some of the long-and-serial parts of the build, for example
optimizing the libsyntax, librustc, and librustc_driver crate. The hope is that
any perf loss from codegen units is more than made up for with the perf gain
from using multiple codegen units.
The value of 16 codegen units here is pretty arbitrary, it's basically just a
number which hopefully means that the cores are always nice and warm.