-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change opt-level from 2 back to 3 #67878
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
I suspect things are still broken; that comment has been around since I can remember. However, we might as well let this inch its way through the queue. @bors rollup=never r+ It's also I believe true that we've not really seen appreciable performance gains when doing this, so it may just generally not be worth it. (I am unsure of the costs in compile time of rustc itself, but would not be surprised if they're non-marginal). I suspect this will fail anyway. |
📌 Commit c7f44b9352f04ec36109d074c5eacf7f6c42b22a has been approved by |
Previously identical chenge has passed CI but made subsequent PRs fail spuriously. That makes this PR quite high risk. |
I think enough time had passed that it may be worth trying again. However, one thing I did want to do but forgot - @bors r- try @rust-timer queue Let's gather perf stats before landing. |
Awaiting bors try build completion |
⌛ Trying commit c7f44b9352f04ec36109d074c5eacf7f6c42b22a with merge c8c22433c5b3e436a1837ce84de8eacc5586de6d... |
☀️ Try build successful - checks-azure |
Queued c8c22433c5b3e436a1837ce84de8eacc5586de6d with parent 7785834, future comparison URL. |
Finished benchmarking try commit c8c22433c5b3e436a1837ce84de8eacc5586de6d, comparison URL. |
|
@bjorn3 I haven't used |
First click on the benchmark group (ctfe-stress-4-check), then click on the percentage corresponding to a specific benchmark (10.7% for clean) You will then go to a page which shows info for every query. |
@bjorn3 ah, thanks! Digging into the results, it seems that this change improves incremental speed by a few percent on quite a few of the test cases. This seems like a good initial sign! I think this PR probably is worth landing, assuming the @Mark-Simulacrum How do you recommend I proceed here? |
I ran The slowdown appears to be coming from the body of Nightly:
This PR:
With this branch, LLVM appears to be generating more SIMD instructions. However, the total percentage for the the various I think this corresponds to checking the discriminant of the I'm not 100% sure that this analysis is correct. However, it appears that part of this regression may be due to LLVM codegenning EDIT: I didn't realize that |
Very interesting! I'm assuming that the expensive call chain is:
My reading of the assembly is that on nightly LLVM is loading some part of the return value at the start (from
(Note: 77.81 on movs, 5.15 on cmps) After this patch, LLVM doesn't bother loading that part of the ret value till the end:
(Note: 77.1 on movs, 12.67 on cmps) I wonder if this double load is hurting pipelining or instruction parallelism in some subtle way. This just seems like unlucky codegen from LLVM. Perhaps one idea is to mark |
I'm not sure that that's correct. I think LLVM is loading the return value first on both nightly and this branch. However, due to Marking |
I’m pretty confident there’s no real point in attempting to operate on a register here. The memory location in question ought to be in the L1 cache – it has been just written to the memory address in question after all. |
Here’s my hypothesis: Register is still faster than cache. The original code does the load from 0x20(%rsp) to %edx first. There isn’t a store to that address, so that can be totally pipelined with the vectorized movs. In the new code, the load from 0x20(%rsp) is only at the end — it cannot be pipelined properly. |
Okay, I added an @Mark-Simulacrum (or someone else with bors privileges), can we get another try + rust-timer run? |
Does the opt-level here only affect the compiler itself or also libcore/libstd builds? If the latter, then that could result in a program size increase. |
Thats a good point - I think it affects every crate. That's much less concerning - hopefully the size impact isn't too high. |
This is an increase of 2,299 bytes which feels pretty significant to me, but I don't have a lot of context here. Certainly @jamesmunns has indicated to me I believe that this is pretty significant for embedded. Could you try to investigate where the size increase is coming from? That is, is it spread out across all of libcore/std or a few functions getting significantly larger? |
(also, we might want to consider the possiblilty that the failures due to lack of disk and OOMs in the earlier crater run are somehow connected to the code size increase, no?) |
I suspect no -- historically we've seen sporadic oom and disk full errors sporadically on entirely clean runs too. |
I would like to see us track down the size diff in that particular test before moving forward. |
Ping from triage: @Others this PR has sat idle for a week, can you please post your status? |
Sorry, have had a couple deadlines outside of this work. Will get back to this ASAP |
After investigating, I believe the difference is almost entirely in the size of the Twiggy output pre this PR:
After applying this PR:
Twiggy diff output:
I kinda doubt this should be a blocker for the PR? Would love some more review feedback. |
Interesting! Looks like the dlmalloc functions are somewhat big. Given that it's just the default allocator, and not actually mandatory (i.e. most wasm executables that care about size will likely switch to some alternative), I imagine that we can land this. Looks like there's still a comment left to be added about the |
In Cargo.toml, the opt-level for `release` and `bench` was overridden to be 2. This was to work around a problem with LLVM 7. However, rust no longer uses LLVM 7, so this is no longer needed. This creates a small compile time regression in MIR constant eval, so I've added a #[inline(always)] on the `step` function used in const eval Also creates a binary size increase in wasm-stringify-ints-small, so I've bumped the limit there.
@Mark-Simulacrum I've added the comment :) |
@bors r+ rollup=never |
📌 Commit 0d52c56 has been approved by |
Change opt-level from 2 back to 3 In Cargo.toml, the opt-level for `release` and `bench` was overridden to be 2. This was to work around a problem with LLVM 7. However, rust no longer uses LLVM 7, so this is hopefully no longer needed? I tried a little bit to replicate the original problem, and could not. I think running this through CI is the best way to smoke test this :) Even if things break dramatically, the comment should be updated to reflect that things are still broken with LLVM 9. I'm just getting started playing with the compiler, so apologies if I've missed an obvious problem here. fixes #52378 (possibly relevant is the [current update to LLVM 10](#67759))
☀️ Test successful - checks-azure |
📣 Toolstate changed by #67878! Tested on commit 138c50f. 💔 rustc-guide on linux: test-pass → test-fail (cc @JohnTitor @amanjeev @spastorino @mark-i-m, @rust-lang/infra). |
Tested on commit rust-lang/rust@138c50f. Direct link to PR: <rust-lang/rust#67878> 💔 rustc-guide on linux: test-pass → test-fail (cc @JohnTitor @amanjeev @spastorino @mark-i-m, @rust-lang/infra).
Pkgsrc changes: * Bump rust bootstrap version to 1.42.0, except for Darwin/i686 where the bootstrap is not (yet?) available. Upstream changes: Version 1.43.0 (2020-04-23) ========================== Language -------- - [Fixed using binary operations with `&{number}` (e.g. `&1.0`) not having the type inferred correctly.][68129] - [Attributes such as `#[cfg()]` can now be used on `if` expressions.][69201] **Syntax only changes** - [Allow `type Foo: Ord` syntactically.][69361] - [Fuse associated and extern items up to defaultness.][69194] - [Syntactically allow `self` in all `fn` contexts.][68764] - [Merge `fn` syntax + cleanup item parsing.][68728] - [`item` macro fragments can be interpolated into `trait`s, `impl`s, and `extern` blocks.][69366] For example, you may now write: ```rust macro_rules! mac_trait { ($i:item) => { trait T { $i } } } mac_trait! { fn foo() {} } ``` These are still rejected *semantically*, so you will likely receive an error but these changes can be seen and parsed by macros and conditional compilation. Compiler -------- - [You can now pass multiple lint flags to rustc to override the previous flags.][67885] For example; `rustc -D unused -A unused-variables` denies everything in the `unused` lint group except `unused-variables` which is explicitly allowed. However, passing `rustc -A unused-variables -D unused` denies everything in the `unused` lint group **including** `unused-variables` since the allow flag is specified before the deny flag (and therefore overridden). - [rustc will now prefer your system MinGW libraries over its bundled libraries if they are available on `windows-gnu`.][67429] - [rustc now buffers errors/warnings printed in JSON.][69227] Libraries --------- - [`Arc<[T; N]>`, `Box<[T; N]>`, and `Rc<[T; N]>`, now implement `TryFrom<Arc<[T]>>`,`TryFrom<Box<[T]>>`, and `TryFrom<Rc<[T]>>` respectively.][69538] **Note** These conversions are only available when `N` is `0..=32`. - [You can now use associated constants on floats and integers directly, rather than having to import the module.][68952] e.g. You can now write `u32::MAX` or `f32::NAN` with no imports. - [`u8::is_ascii` is now `const`.][68984] - [`String` now implements `AsMut<str>`.][68742] - [Added the `primitive` module to `std` and `core`.][67637] This module reexports Rust's primitive types. This is mainly useful in macros where you want avoid these types being shadowed. - [Relaxed some of the trait bounds on `HashMap` and `HashSet`.][67642] - [`string::FromUtf8Error` now implements `Clone + Eq`.][68738] Stabilized APIs --------------- - [`Once::is_completed`] - [`f32::LOG10_2`] - [`f32::LOG2_10`] - [`f64::LOG10_2`] - [`f64::LOG2_10`] - [`iter::once_with`] Cargo ----- - [You can now set config `[profile]`s in your `.cargo/config`, or through your environment.][cargo/7823] - [Cargo will now set `CARGO_BIN_EXE_<name>` pointing to a binary's executable path when running integration tests or benchmarks.][cargo/7697] `<name>` is the name of your binary as-is e.g. If you wanted the executable path for a binary named `my-program`you would use `env!("CARGO_BIN_EXE_my-program")`. Misc ---- - [Certain checks in the `const_err` lint were deemed unrelated to const evaluation][69185], and have been moved to the `unconditional_panic` and `arithmetic_overflow` lints. Compatibility Notes ------------------- - [Having trailing syntax in the `assert!` macro is now a hard error.][69548] This has been a warning since 1.36.0. - [Fixed `Self` not having the correctly inferred type.][69340] This incorrectly led to some instances being accepted, and now correctly emits a hard error. [69340]: rust-lang/rust#69340 Internal Only ------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of `rustc` and related tools. - [All components are now built with `opt-level=3` instead of `2`.][67878] - [Improved how rustc generates drop code.][67332] - [Improved performance from `#[inline]`-ing certain hot functions.][69256] - [traits: preallocate 2 Vecs of known initial size][69022] - [Avoid exponential behaviour when relating types][68772] - [Skip `Drop` terminators for enum variants without drop glue][68943] - [Improve performance of coherence checks][68966] - [Deduplicate types in the generator witness][68672] - [Invert control in struct_lint_level.][68725] [67332]: rust-lang/rust#67332 [67429]: rust-lang/rust#67429 [67637]: rust-lang/rust#67637 [67642]: rust-lang/rust#67642 [67878]: rust-lang/rust#67878 [67885]: rust-lang/rust#67885 [68129]: rust-lang/rust#68129 [68672]: rust-lang/rust#68672 [68725]: rust-lang/rust#68725 [68728]: rust-lang/rust#68728 [68738]: rust-lang/rust#68738 [68742]: rust-lang/rust#68742 [68764]: rust-lang/rust#68764 [68772]: rust-lang/rust#68772 [68943]: rust-lang/rust#68943 [68952]: rust-lang/rust#68952 [68966]: rust-lang/rust#68966 [68984]: rust-lang/rust#68984 [69022]: rust-lang/rust#69022 [69185]: rust-lang/rust#69185 [69194]: rust-lang/rust#69194 [69201]: rust-lang/rust#69201 [69227]: rust-lang/rust#69227 [69548]: rust-lang/rust#69548 [69256]: rust-lang/rust#69256 [69361]: rust-lang/rust#69361 [69366]: rust-lang/rust#69366 [69538]: rust-lang/rust#69538 [cargo/7823]: rust-lang/cargo#7823 [cargo/7697]: rust-lang/cargo#7697 [`Once::is_completed`]: https://doc.rust-lang.org/std/sync/struct.Once.html#method.is_completed [`f32::LOG10_2`]: https://doc.rust-lang.org/std/f32/consts/constant.LOG10_2.html [`f32::LOG2_10`]: https://doc.rust-lang.org/std/f32/consts/constant.LOG2_10.html [`f64::LOG10_2`]: https://doc.rust-lang.org/std/f64/consts/constant.LOG10_2.html [`f64::LOG2_10`]: https://doc.rust-lang.org/std/f64/consts/constant.LOG2_10.html [`iter::once_with`]: https://doc.rust-lang.org/std/iter/fn.once_with.html
In Cargo.toml, the opt-level for
release
andbench
was overridden to be 2. This was to work around a problem with LLVM 7. However, rust no longer uses LLVM 7, so this is hopefully no longer needed?I tried a little bit to replicate the original problem, and could not. I think running this through CI is the best way to smoke test this :) Even if things break dramatically, the comment should be updated to reflect that things are still broken with LLVM 9.
I'm just getting started playing with the compiler, so apologies if I've missed an obvious problem here.
fixes #52378
(possibly relevant is the current update to LLVM 10)