-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build the compiler with -Ctarget-cpu=x86-64-v2 #79043
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors try @rust-timer queue One thought though is that it probably makes more sense to go for -v4 or whatever a 3600X Ryzen corresponds to, to get a sense of maximum benefits from this kind of optimization. We can try that after we get an idea of what -v2 gives us, though. |
Awaiting bors try build completion |
⌛ Trying commit 17b2818ca70ddad8c0a06ec63a96d0a1b58ec65d with merge 714dade877e50c10697708c0c2a77840ba58b69a... |
@Mark-Simulacrum wow that was a quick try issuance :). good point about the -v4. Should I change it? I wasn't sure which CPU the CI env uses. Apparently it's this one (extracted from the try build):
Which according to a rough glance on the tables corresponds to -v4. Generally, figuring out which target cpu the cpu supports is probably another thing we need to figure out :). I'm not entirely sure what the failure mode is if one tries to run a binary compiled for -v4 on something that doesn't support it. |
☀️ Try build successful - checks-actions |
Queued 714dade877e50c10697708c0c2a77840ba58b69a with parent 66c1309, future comparison URL. |
Ultimately I would expect that if it builds at all in CI it's probably fine; I suspect that both CI and the 3600X we use on perf are sufficiently modern for -v4 (but who knows, I think 3600X doesn't support AVX512 for example?). If you want to switch this to v4 I can queue that as well. |
Zen 2 (e.g. Ryzen 3600X) is x86-64-v3 which could be still beneficial over -v2 if LLVM uses new instructions for hashing. |
Ah, ok, then we can't check v4 on current perf but we can still check v3. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit c20c5ca with merge 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
It looks like we need to wait for the current build to finish before starting a new one, though I'm not sure why that is limited in the db, so probably can be removed as a constraint in the future. |
Finished benchmarking try commit (714dade877e50c10697708c0c2a77840ba58b69a): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@rust-timer build 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a |
Queued 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a with parent 30e49a9, future comparison URL. |
Finished benchmarking try commit (08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
V3 looks like a pretty significant wall time loss, V2 looks like a slight win (or perhaps lost in the noise). My guess is that this is not worth it at this time. |
Yeah there are some single digit improvements in instruction counts (up to -2.5% in inflate-check full for v3, up to -6.6% in keccak-debug full) but I guess that's due to what instruction set extensions are about :). But if you click at the passes overview it in fact shows a regression in the summary, not an improvement. Different measurement methods? Most times it shows a slight regression in the time delta. In fact, the pass overview is quite useless as the passes are all over the place. Some improve, others regress, sometimes quite heavily. Tons of noise there. I'd have wanted to identify passes where the instruction count was heavily reduced to check whether they might be a good place for vectorization, but the pass overview is useless for that :). I think what the compiler is doing doesn't lend itself that well to being sped up by target extensions because mostly they are about bulk processing of data. Maybe it's more power efficient now, maybe not, but even if, it's likely not enough to warrant further inspections. On the bright side, optimizing LLVM is still left unexplored. Also, the LLVM commit I backported is only half of the story: LLVM 12.0 will also gain ability to tune for CPUs, like gcc's For now though, closing. |
This PR instructs rustbuild to compile the compiler with the
-Ctarget-cpu=x86-64-v2
option enabled in hope of getting some optimization gains from autovectorization.The PR also adds support for
x86-64-{2,3,4}
target CPUs by backporting an LLVM 12.0 commit.I'm opening this to get a perf run to gauge the potential speedups on the rustc side. The LLVM side isn't built with the option enabled, as that would require clang 12.0 or manual enabling of the target features corresponding to the target CPU.
If the perf run shows up nice improvements one can talk about how to get this to users. One can't just enable this unconditionally for all users as it'd break for users of older CPUs. It's a similar question to #59667.