Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #8837 - bjorn3:improve_perf, r=alexcrichton
Improve performance of almost fresh builds This currently saves about 15ms out of the 170ms Cargo overhead when compiling Bevy. part of #8833 <details><summary>Benchmark results as of <a href="https://github.com/rust-lang/cargo/commit/602a1bd7f5a0da9878952f492508aca5eddd1c53"><code>602a1bd</code></a></summary> Completely fresh: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "../../cargo/cargo_master build --release --example breakout" "../../cargo/cargo_improve_perf build --release --example breakout" Benchmark #1: ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 613.8 ms ± 509.1 ms [User: 147.6 ms, System: 38.4 ms] Range (min … max): 170.8 ms … 1199.2 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: ../../cargo/cargo_improve_perf build --release --example breakout Time (mean ± σ): 164.2 ms ± 0.8 ms [User: 137.0 ms, System: 27.1 ms] Range (min … max): 162.8 ms … 169.6 ms 100 runs Summary '../../cargo/cargo_improve_perf build --release --example breakout' ran 3.74 ± 3.10 times faster than '../../cargo/cargo_master build --release --example breakout' ``` (statistical outliers are consistently reproducible and don't happen for any other benchmarks) ``` $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_master build --release --example breakout Finished release [optimized] target(s) in 0.16s [...] Finished release [optimized] target(s) in 0.16s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 192,95 msec task-clock:u # 0,242 CPUs utilized ( +- 0,69% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4926 page-faults:u # 0,026 M/sec ( +- 0,11% ) 387516813 cycles:u # 2,008 GHz ( +- 0,69% ) 685141858 instructions:u # 1,77 insn per cycle ( +- 0,04% ) 124443483 branches:u # 644,958 M/sec ( +- 0,05% ) 2726944 branch-misses:u # 2,19% of all branches ( +- 0,10% ) 0,796 +- 0,168 seconds time elapsed ( +- 21,06% ) $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_improve_perf build --release --example breakout Finished release [optimized] target(s) in 0.14s [...] Finished release [optimized] target(s) in 0.15s Performance counter stats for '../../cargo/cargo_improve_perf build --release --example breakout' (10 runs): 168,78 msec task-clock:u # 0,997 CPUs utilized ( +- 0,56% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4075 page-faults:u # 0,024 M/sec ( +- 0,16% ) 372565970 cycles:u # 2,207 GHz ( +- 0,61% ) 667356921 instructions:u # 1,79 insn per cycle ( +- 0,03% ) 120170432 branches:u # 712,010 M/sec ( +- 0,04% ) 2642670 branch-misses:u # 2,20% of all branches ( +- 0,12% ) 0,169294 +- 0,000933 seconds time elapsed ( +- 0,55% ) ``` Need to recompile single executable: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout" "touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout" Benchmark #1: touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 658.9 ms ± 1.8 ms [User: 538.6 ms, System: 181.1 ms] Range (min … max): 655.5 ms … 668.8 ms 100 runs Benchmark #2: touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout Time (mean ± σ): 648.6 ms ± 2.1 ms [User: 534.7 ms, System: 162.6 ms] Range (min … max): 645.2 ms … 659.5 ms 100 runs Summary 'touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout' ran 1.02 ± 0.00 times faster than 'touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout' ``` ``` $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_master build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.67s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.66s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 183,65 msec task-clock:u # 0,265 CPUs utilized ( +- 1,08% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 6603 page-faults:u # 0,036 M/sec ( +- 0,11% ) 382629371 cycles:u # 2,083 GHz ( +- 0,79% ) 673192095 instructions:u # 1,76 insn per cycle ( +- 0,03% ) 121518254 branches:u # 661,688 M/sec ( +- 0,04% ) 2698503 branch-misses:u # 2,22% of all branches ( +- 0,18% ) 0,69376 +- 0,00274 seconds time elapsed ( +- 0,39% ) $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_improve_perf build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.66s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.76s Performance counter stats for '../../cargo/cargo_improve_perf build --release --example breakout' (10 runs): 177,03 msec task-clock:u # 0,256 CPUs utilized ( +- 1,70% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 5774 page-faults:u # 0,033 M/sec ( +- 0,14% ) 381121369 cycles:u # 2,153 GHz ( +- 1,29% ) 672129390 instructions:u # 1,76 insn per cycle ( +- 0,03% ) 121248111 branches:u # 684,900 M/sec ( +- 0,04% ) 2672832 branch-misses:u # 2,20% of all branches ( +- 0,34% ) 0,6924 +- 0,0148 seconds time elapsed ( +- 2,13% ) ``` </details> <details><summary>Benchmark results as of <a href="https://github.com/rust-lang/cargo/commit/ba49b13e65fd487b8fc1761456c34e8d9feefb0e"><code>ba49b13</code></a></summary> Completely fresh: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "../../cargo/cargo_master build --release --example breakout" "../../cargo/cargo_improve_perf2 build --release --example breakout" Benchmark #1: ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 635.4 ms ± 511.6 ms [User: 146.2 ms, System: 41.1 ms] Range (min … max): 172.0 ms … 1208.7 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: ../../cargo/cargo_improve_perf2 build --release --example breakout Time (mean ± σ): 165.3 ms ± 1.2 ms [User: 137.2 ms, System: 28.0 ms] Range (min … max): 163.7 ms … 171.0 ms 100 runs Summary '../../cargo/cargo_improve_perf2 build --release --example breakout' ran 3.84 ± 3.09 times faster than '../../cargo/cargo_master build --release --example breakout' ``` (statistical outliers are consistently reproducible and don't happen for any other benchmarks) ``` $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_master build --release --example breakout Finished release [optimized] target(s) in 0.16s [...] Finished release [optimized] target(s) in 0.16s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 197,21 msec task-clock:u # 0,220 CPUs utilized ( +- 0,79% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4918 page-faults:u # 0,025 M/sec ( +- 0,09% ) 395214529 cycles:u # 2,004 GHz ( +- 0,71% ) 685707083 instructions:u # 1,74 insn per cycle ( +- 0,04% ) 124571038 branches:u # 631,667 M/sec ( +- 0,05% ) 2748386 branch-misses:u # 2,21% of all branches ( +- 0,35% ) 0,897 +- 0,155 seconds time elapsed ( +- 17,32% ) $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_improve_perf2 build --release --example breakout Finished release [optimized] target(s) in 0.14s [...] Finished release [optimized] target(s) in 0.15s Performance counter stats for '../../cargo/cargo_improve_perf2 build --release --example breakout' (10 runs): 168,28 msec task-clock:u # 0,621 CPUs utilized ( +- 0,51% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4086 page-faults:u # 0,024 M/sec ( +- 0,15% ) 371029906 cycles:u # 2,205 GHz ( +- 0,48% ) 667493108 instructions:u # 1,80 insn per cycle ( +- 0,02% ) 120202436 branches:u # 714,308 M/sec ( +- 0,03% ) 2659209 branch-misses:u # 2,21% of all branches ( +- 0,13% ) 0,271 +- 0,103 seconds time elapsed ( +- 37,82% ) ``` Need to recompile single executable: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout" "touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout" Benchmark #1: touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 660.7 ms ± 2.9 ms [User: 545.6 ms, System: 175.2 ms] Range (min … max): 656.2 ms … 675.1 ms 100 runs Benchmark #2: touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout Time (mean ± σ): 650.2 ms ± 5.3 ms [User: 542.9 ms, System: 156.0 ms] Range (min … max): 645.9 ms … 687.7 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary 'touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout' ran 1.02 ± 0.01 times faster than 'touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout' ``` ``` $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_master build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.65s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.66s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 181,52 msec task-clock:u # 0,264 CPUs utilized ( +- 0,29% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 6618 page-faults:u # 0,036 M/sec ( +- 0,09% ) 381324766 cycles:u # 2,101 GHz ( +- 0,21% ) 673170130 instructions:u # 1,77 insn per cycle ( +- 0,03% ) 121511051 branches:u # 669,422 M/sec ( +- 0,04% ) 2700116 branch-misses:u # 2,22% of all branches ( +- 0,17% ) 0,68766 +- 0,00293 seconds time elapsed ( +- 0,43% ) $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_improve_perf2 build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.64s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.64s Performance counter stats for '../../cargo/cargo_improve_perf2 build --release --example breakout' (10 runs): 173,78 msec task-clock:u # 0,257 CPUs utilized ( +- 0,65% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 5772 page-faults:u # 0,033 M/sec ( +- 0,17% ) 378994346 cycles:u # 2,181 GHz ( +- 0,62% ) 672499584 instructions:u # 1,77 insn per cycle ( +- 0,04% ) 121341331 branches:u # 698,266 M/sec ( +- 0,05% ) 2691563 branch-misses:u # 2,22% of all branches ( +- 0,17% ) 0,67554 +- 0,00641 seconds time elapsed ( +- 0,95% ) ``` </summary>
- Loading branch information