Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance changes related to vzeroupper emit changes #98705

Closed
performanceautofiler bot opened this issue Feb 20, 2024 · 5 comments
Closed

Performance changes related to vzeroupper emit changes #98705

performanceautofiler bot opened this issue Feb 20, 2024 · 5 comments
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows runtime-coreclr specific to the CoreCLR runtime

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
17.21 ns 18.88 ns 1.10 0.03 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'

Payloads

Baseline
Compare

System.Tests.Perf_Char.Char_ToUpper(c: 'İ', cultureName: en-US)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Perf_Ascii

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
5.97 ns 7.36 ns 1.23 0.03 False
5.87 ns 7.15 ns 1.22 0.03 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Perf_Ascii*'

Payloads

Baseline
Compare

System.Text.Perf_Ascii.EqualsIgnoreCase_ExactlyTheSame_Chars(Size: 6)

ETL Files

Histogram

JIT Disasms

System.Text.Perf_Ascii.EqualsIgnoreCase_ExactlyTheSame_Bytes(Size: 6)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
20.47 μs 23.18 μs 1.13 0.09 False
2.80 μs 3.77 μs 1.34 0.10 True

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;Int32, Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.TryGetValueFalse<Int32, Int32>.ImmutableDictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Sort<BigStruct>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
7.21 μs 11.43 μs 1.58 0.61 False
7.51 μs 11.76 μs 1.56 0.65 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Sort&lt;BigStruct&gt;*'

Payloads

Baseline
Compare

System.Collections.Sort<BigStruct>.List(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Sort<BigStruct>.Array(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Buffers.Tests.SearchValuesByteTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
9.34 ns 11.14 ns 1.19 0.06 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Buffers.Tests.SearchValuesByteTests*'

Payloads

Baseline
Compare

System.Buffers.Tests.SearchValuesByteTests.IndexOfAnyExcept(Values: "abcdefABCDEF0123456789")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in PerfLabTests.CastingPerf

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
311.83 μs 362.26 μs 1.16 0.00 True
311.83 μs 363.82 μs 1.17 0.00 True
364.28 μs 420.81 μs 1.16 0.03 False

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.CastingPerf*'

Payloads

Baseline
Compare

PerfLabTests.CastingPerf.ObjFooIsObj2

ETL Files

Histogram

JIT Disasms

PerfLabTests.CastingPerf.IFooFooIsIFoo

ETL Files

Histogram

JIT Disasms

PerfLabTests.CastingPerf.FooObjCastIfIsa

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_VectorOf<Int16>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
39.54 ns 44.49 ns 1.13 0.01 True
39.66 ns 44.49 ns 1.12 0.02 True

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_VectorOf&lt;Int16&gt;*'

Payloads

Baseline
Compare

System.Numerics.Tests.Perf_VectorOf<Int16>.DivisionOperatorBenchmark

ETL Files

Histogram

JIT Disasms

System.Numerics.Tests.Perf_VectorOf<Int16>.DivideBenchmark

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins
Copy link
Member

#98261

@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Feb 20, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 20, 2024
@DrewScoggins DrewScoggins changed the title [Perf] Windows/x64: 13 Regressions on 2/13/2024 8:56:49 PM Performance changes related to vzeroupper emit changes Feb 20, 2024
@DrewScoggins
Copy link
Member

cc @tannergooding

@DrewScoggins
Copy link
Member

DrewScoggins commented Feb 20, 2024

@danmoseley danmoseley added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 20, 2024
@ghost
Copy link

ghost commented Feb 20, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
17.21 ns 18.88 ns 1.10 0.03 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'

Payloads

Baseline
Compare

System.Tests.Perf_Char.Char_ToUpper(c: 'İ', cultureName: en-US)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Perf_Ascii

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
5.97 ns 7.36 ns 1.23 0.03 False
5.87 ns 7.15 ns 1.22 0.03 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Perf_Ascii*'

Payloads

Baseline
Compare

System.Text.Perf_Ascii.EqualsIgnoreCase_ExactlyTheSame_Chars(Size: 6)

ETL Files

Histogram

JIT Disasms

System.Text.Perf_Ascii.EqualsIgnoreCase_ExactlyTheSame_Bytes(Size: 6)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.TryGetValueFalse<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
20.47 μs 23.18 μs 1.13 0.09 False
2.80 μs 3.77 μs 1.34 0.10 True

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.TryGetValueFalse&lt;Int32, Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.TryGetValueFalse<Int32, Int32>.ImmutableDictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.TryGetValueFalse<Int32, Int32>.Dictionary(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Sort<BigStruct>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
7.21 μs 11.43 μs 1.58 0.61 False
7.51 μs 11.76 μs 1.56 0.65 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Sort&lt;BigStruct&gt;*'

Payloads

Baseline
Compare

System.Collections.Sort<BigStruct>.List(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Sort<BigStruct>.Array(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Buffers.Tests.SearchValuesByteTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
9.34 ns 11.14 ns 1.19 0.06 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Buffers.Tests.SearchValuesByteTests*'

Payloads

Baseline
Compare

System.Buffers.Tests.SearchValuesByteTests.IndexOfAnyExcept(Values: "abcdefABCDEF0123456789")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in PerfLabTests.CastingPerf

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
311.83 μs 362.26 μs 1.16 0.00 True
311.83 μs 363.82 μs 1.17 0.00 True
364.28 μs 420.81 μs 1.16 0.03 False

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.CastingPerf*'

Payloads

Baseline
Compare

PerfLabTests.CastingPerf.ObjFooIsObj2

ETL Files

Histogram

JIT Disasms

PerfLabTests.CastingPerf.IFooFooIsIFoo

ETL Files

Histogram

JIT Disasms

PerfLabTests.CastingPerf.FooObjCastIfIsa

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline e3af00bf62f3280b4db46e2a01caf68a4dd5c991
Compare 49fe3b06cdfce737ea1c8964e01f2dd2a4e77d44
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_VectorOf<Int16>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
39.54 ns 44.49 ns 1.13 0.01 True
39.66 ns 44.49 ns 1.12 0.02 True

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_VectorOf&lt;Int16&gt;*'

Payloads

Baseline
Compare

System.Numerics.Tests.Perf_VectorOf<Int16>.DivisionOperatorBenchmark

ETL Files

Histogram

JIT Disasms

System.Numerics.Tests.Perf_VectorOf<Int16>.DivideBenchmark

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

os-windows, arch-x64, area-CodeGen-coreclr, untriaged, runtime-coreclr

Milestone: -

@tannergooding
Copy link
Member

This is effectively "by design" and we are now simply following the Architecture Optimization Manual guidance on where to emit vzeroupper.

Previously we were failing to emit it in some places where it was required, this could lead to a significant performance hit on some hardware (which multiple customers reported hitting). However, if we aren't in a scenario where the native code actually needs vzeroupper, then some hardware will itself see a perf hit from the additional instructions needing to execute (newer hardware is typically not impacted).

Correspondingly, we were emitting it in too many unnecessary places as well. Removing these unnecessary places can result in a perf increase on older hardware.

It's ultimately a set of tradeoffs, but since we're now matching the official guidance on where to emit vzeroupper, we are in a better overall place.

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 21, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

3 participants