Build the compiler with -Ctarget-cpu=x86-64-v2 #79043

est31 · 2020-11-14T13:58:47Z

This PR instructs rustbuild to compile the compiler with the -Ctarget-cpu=x86-64-v2 option enabled in hope of getting some optimization gains from autovectorization.

The PR also adds support for x86-64-{2,3,4} target CPUs by backporting an LLVM 12.0 commit.

I'm opening this to get a perf run to gauge the potential speedups on the rustc side. The LLVM side isn't built with the option enabled, as that would require clang 12.0 or manual enabling of the target features corresponding to the target CPU.

If the perf run shows up nice improvements one can talk about how to get this to users. One can't just enable this unconditionally for all users as it'd break for users of older CPUs. It's a similar question to #59667.

rust-highfive · 2020-11-14T13:58:49Z

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

rust-highfive · 2020-11-14T13:58:50Z

⚠️ Warning ⚠️

These commits modify submodules.

rust-log-analyzer · 2020-11-14T14:08:35Z

The job x86_64-gnu-llvm-8 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
[command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
##[endgroup]
##[group]Fetching the repository
[command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=2 origin +bcf86308065550bb70968c6a63fd1ec3ad683328:refs/remotes/pull/79043/merge
---
   Compiling typenum v1.12.0
   Compiling version_check v0.9.1
   Compiling hashbrown v0.9.0
   Compiling getrandom v0.1.14
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
   Compiling either v1.6.0
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
error: could not compile `scopeguard`

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
warning: build failed, waiting for other jobs to finish...
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "16" "--release" "--locked" "--color" "always" "--features" " llvm" "--manifest-path" "/checkout/compiler/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit code: 101
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:07:11

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

Mark-Simulacrum · 2020-11-14T14:10:05Z

@bors try @rust-timer queue

One thought though is that it probably makes more sense to go for -v4 or whatever a 3600X Ryzen corresponds to, to get a sense of maximum benefits from this kind of optimization. We can try that after we get an idea of what -v2 gives us, though.

rust-timer · 2020-11-14T14:10:07Z

Awaiting bors try build completion

bors · 2020-11-14T14:10:16Z

⌛ Trying commit 17b2818ca70ddad8c0a06ec63a96d0a1b58ec65d with merge 714dade877e50c10697708c0c2a77840ba58b69a...

est31 · 2020-11-14T14:23:17Z

@Mark-Simulacrum wow that was a quick try issuance :). good point about the -v4. Should I change it? I wasn't sure which CPU the CI env uses. Apparently it's this one (extracted from the try build):

processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
stepping	: 4
microcode	: 0xffffffff
cpu MHz		: 2095.247
cache size	: 36608 KB
physical id	: 0
siblings	: 16
core id		: 15
cpu cores	: 16
apicid		: 15
initial apicid	: 15
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4190.49
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Which according to a rough glance on the tables corresponds to -v4.

Generally, figuring out which target cpu the cpu supports is probably another thing we need to figure out :). I'm not entirely sure what the failure mode is if one tries to run a binary compiled for -v4 on something that doesn't support it.

est31 · 2020-11-14T14:30:29Z

Relevant links for reading:

bors · 2020-11-14T14:56:22Z

☀️ Try build successful - checks-actions
Build commit: 714dade877e50c10697708c0c2a77840ba58b69a (714dade877e50c10697708c0c2a77840ba58b69a)

rust-timer · 2020-11-14T14:56:23Z

Queued 714dade877e50c10697708c0c2a77840ba58b69a with parent 66c1309, future comparison URL.

…64-v2

Mark-Simulacrum · 2020-11-14T15:15:36Z

Ultimately I would expect that if it builds at all in CI it's probably fine; I suspect that both CI and the 3600X we use on perf are sufficiently modern for -v4 (but who knows, I think 3600X doesn't support AVX512 for example?). If you want to switch this to v4 I can queue that as well.

mati865 · 2020-11-14T15:29:46Z

Zen 2 (e.g. Ryzen 3600X) is x86-64-v3 which could be still beneficial over -v2 if LLVM uses new instructions for hashing.

Mark-Simulacrum · 2020-11-14T15:38:35Z

Ah, ok, then we can't check v4 on current perf but we can still check v3.

rust-log-analyzer · 2020-11-14T15:53:52Z

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
[command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
##[endgroup]
##[group]Fetching the repository
[command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=2 origin +c6d985906f40216fddc21a88ac0fb6a2259281db:refs/remotes/pull/79043/merge
---
configure: rust.channel         := nightly
configure: rust.debug-assertions := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
configure: 
---
Diff in /checkout/src/bootstrap/compile.rs at line 522:
     }
 }
 
-pub fn rustc_cargo(builder: &Builder<'_>, cargo: &mut Cargo, target: TargetSelection, compiler: Compiler) {
+pub fn rustc_cargo(
+    builder: &Builder<'_>,
+    cargo: &mut Cargo,
Running `"/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/rustfmt" "--config-path" "/checkout" "--edition" "2018" "--unstable-features" "--skip-children" "--check" "/checkout/src/bootstrap/compile.rs"` failed.
If you're running `tidy`, try again with `--bless`. Or, if you just want to format code, run `./x.py fmt` instead.
+    compiler: Compiler,
+) {
     cargo
     cargo
         .arg("--features")
         .arg(builder.rustc_features())
Diff in /checkout/src/bootstrap/compile.rs at line 531:
     rustc_cargo_env(builder, cargo, target, compiler);
 
 
-pub fn rustc_cargo_env(builder: &Builder<'_>, cargo: &mut Cargo, target: TargetSelection, compiler: Compiler) {
+pub fn rustc_cargo_env(
+    builder: &Builder<'_>,
+    cargo: &mut Cargo,
+    compiler: Compiler,
+) {
+) {
     // Set some configuration variables picked up by build scripts and
     // the compiler alike
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --stage 2 src/tools/tidy
Build completed unsuccessfully in 0:00:14
== clock drift check ==
  local time: Sat Nov 14 15:53:43 UTC 2020

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

Mark-Simulacrum · 2020-11-14T16:03:46Z

@bors try @rust-timer queue

rust-timer · 2020-11-14T16:03:47Z

Awaiting bors try build completion

bors · 2020-11-14T16:03:56Z

⌛ Trying commit c20c5ca with merge 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a...

bors · 2020-11-14T16:49:33Z

☀️ Try build successful - checks-actions
Build commit: 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a (08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a)

Mark-Simulacrum · 2020-11-14T17:32:36Z

It looks like we need to wait for the current build to finish before starting a new one, though I'm not sure why that is limited in the db, so probably can be removed as a constraint in the future.

rust-timer · 2020-11-14T17:48:58Z

Finished benchmarking try commit (714dade877e50c10697708c0c2a77840ba58b69a): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

Mark-Simulacrum · 2020-11-14T18:32:51Z

@rust-timer build 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a

rust-timer · 2020-11-14T18:32:52Z

Queued 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a with parent 30e49a9, future comparison URL.

rust-timer · 2020-11-14T21:31:30Z

Finished benchmarking try commit (08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

Mark-Simulacrum · 2020-11-14T21:35:29Z

V3 looks like a pretty significant wall time loss, V2 looks like a slight win (or perhaps lost in the noise). My guess is that this is not worth it at this time.

est31 · 2020-11-15T00:05:26Z

Yeah there are some single digit improvements in instruction counts (up to -2.5% in inflate-check full for v3, up to -6.6% in keccak-debug full) but I guess that's due to what instruction set extensions are about :). But if you click at the passes overview it in fact shows a regression in the summary, not an improvement. Different measurement methods? Most times it shows a slight regression in the time delta. In fact, the pass overview is quite useless as the passes are all over the place. Some improve, others regress, sometimes quite heavily. Tons of noise there. I'd have wanted to identify passes where the instruction count was heavily reduced to check whether they might be a good place for vectorization, but the pass overview is useless for that :).

I think what the compiler is doing doesn't lend itself that well to being sped up by target extensions because mostly they are about bulk processing of data.

Maybe it's more power efficient now, maybe not, but even if, it's likely not enough to warrant further inspections.

On the bright side, optimizing LLVM is still left unexplored. Also, the LLVM commit I backported is only half of the story: LLVM 12.0 will also gain ability to tune for CPUs, like gcc's -mtune. It will still run on older CPUs but instructions are emitted in a way to run faster on newer ones. Originally the commit I backported built on that feature to also set default tunings for different CPUs for v2, v3, v4, but I removed it to not having to backport the -mtune changes as well. So maybe we can repeat this test in the future once LLVM 12.0 is around with the proper version of the commit. Maybe one can also experiment with tuning the CPU by that time. I think that's best done once LLVM 12.0 is around and rustc uses it. Even better if the CI uses that as compiler for the native LLVM.

For now though, closing.

rust-highfive assigned Mark-Simulacrum Nov 14, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 14, 2020

Add support for x86-64-{2,3,4} target CPUs and pass -Ctarget-cpu=x86-…

cca4102

…64-v2

est31 mentioned this pull request Nov 14, 2020

Build the compiler with -Ctarget-cpu=x86-64-v4 #79044

Closed

v3 instead of v2

c20c5ca

est31 force-pushed the x86_64_v2 branch from 17b2818 to c20c5ca Compare November 14, 2020 15:40

This comment has been minimized.

Sign in to view

Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 14, 2020

est31 closed this Nov 15, 2020

est31 mentioned this pull request Nov 15, 2020

Can't use download-ci-llvm any more #79071

Closed

est31 mentioned this pull request Feb 12, 2021

Add x86-64-v2, x86-64-v3, and x86-64-v4 as available target_cpus #82024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build the compiler with -Ctarget-cpu=x86-64-v2 #79043

Build the compiler with -Ctarget-cpu=x86-64-v2 #79043

est31 commented Nov 14, 2020 •

edited

Loading

rust-highfive commented Nov 14, 2020

rust-highfive commented Nov 14, 2020

rust-log-analyzer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

rust-timer commented Nov 14, 2020

bors commented Nov 14, 2020

est31 commented Nov 14, 2020

est31 commented Nov 14, 2020

bors commented Nov 14, 2020

rust-timer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

mati865 commented Nov 14, 2020 •

edited

Loading

Mark-Simulacrum commented Nov 14, 2020

rust-log-analyzer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

rust-timer commented Nov 14, 2020

bors commented Nov 14, 2020

bors commented Nov 14, 2020

This comment has been minimized.

This comment has been minimized.

Mark-Simulacrum commented Nov 14, 2020 •

edited

Loading

rust-timer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

rust-timer commented Nov 14, 2020

rust-timer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

est31 commented Nov 15, 2020

Build the compiler with -Ctarget-cpu=x86-64-v2 #79043

Build the compiler with -Ctarget-cpu=x86-64-v2 #79043

Conversation

est31 commented Nov 14, 2020 • edited Loading

rust-highfive commented Nov 14, 2020

rust-highfive commented Nov 14, 2020

rust-log-analyzer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

rust-timer commented Nov 14, 2020

bors commented Nov 14, 2020

est31 commented Nov 14, 2020

est31 commented Nov 14, 2020

bors commented Nov 14, 2020

rust-timer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

mati865 commented Nov 14, 2020 • edited Loading

Mark-Simulacrum commented Nov 14, 2020

rust-log-analyzer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

rust-timer commented Nov 14, 2020

bors commented Nov 14, 2020

bors commented Nov 14, 2020

This comment has been minimized.

This comment has been minimized.

Mark-Simulacrum commented Nov 14, 2020 • edited Loading

rust-timer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

rust-timer commented Nov 14, 2020

rust-timer commented Nov 14, 2020

Mark-Simulacrum commented Nov 14, 2020

est31 commented Nov 15, 2020

est31 commented Nov 14, 2020 •

edited

Loading

mati865 commented Nov 14, 2020 •

edited

Loading

Mark-Simulacrum commented Nov 14, 2020 •

edited

Loading