Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Enable physical promotion by default #88090

Merged
merged 1 commit into from
Jun 29, 2023

Conversation

jakobbotsch
Copy link
Member

See #76928.

Fix #6534
Fix #6707
Fix #7576
Fix #32415
Fix #58522
Fix #68797
Fix #71510
Fix #71565
Fix #76928

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 27, 2023
@ghost ghost assigned jakobbotsch Jun 27, 2023
@ghost
Copy link

ghost commented Jun 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

See #76928.

Fix #6534
Fix #6707
Fix #7576
Fix #32415
Fix #58522
Fix #68797
Fix #71510
Fix #71565
Fix #76928

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

The failure is a test bug. #88097 has the fix.

@jakobbotsch
Copy link
Member Author

/azp run runtime, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jun 28, 2023

cc @dotnet/jit-contrib PTAL @AndyAyersMS. The failure is #87934.

Diffs. TP impact ranges from 0.4% to 1.6%.

I have analyzed the actual benchmark regressions using the perf lab reportings and created a report of them (both regressions and improvements) that includes diffs. The report is viewable at https://github.com/jakobbotsch/perf-diff-finder; it is too big to be rendered by GitHub's markdown renderer, but you can use the online codespaces to view it in VSCode. To do that press ., open physicalpromotion/regressions.md and then execute the Markdown: Open Preview command (hotkey ctrl-shift-V).

The regressions were identified by a Kusto query (thanks Andy) over the perf lab data. The query computes the median execution time of each benchmark over the past 7 days with and without physical promotion enabled. I then limited these to benchmarks taking more than 1 nanosecond that regressed by 3% or more. That returned a list of about 200 benchmarks. For each benchmark I used https://github.com/AndyAyersMS/InstructionsRetiredExplorer to find all hot functions (> 1% fraction of samples). This set was then further limited to the benchmarks that actually had physical promotions in a hot function.

This reduced the set to the 26 that can be viewed in the report, for which I went through and analyzed the causes and left notes and the perf lab graphs in the report. Many of these I still classified as noisy, but there are definitely a few actual regressions in there.

I also ran my tool for all improvements (physicalpromotion/improvements.md), with two key differences

  • The threshold was set to benchmarks that improve by 10% or more, which results in about 400 benchmarks from the query, for my tool to be able to finish generating the report overnight. For benchmarks that improve by 3% or more the query returns around 1500 rows (with presumably a large number of false positives, but the number is still around 8x the same number on the regression side). The set was further reduced to the 121 included in the report in the same way as above.
  • I did not go through and analyze these individually or attach the perf lab graphs to them. If you'd like to look at the perf lab graphs you can do so here (internal only; ping me if you don't have access) or here (public, but with no direct comparisons to standard perf lab runs).

@jakobbotsch jakobbotsch marked this pull request as ready for review June 28, 2023 20:02
@jakobbotsch jakobbotsch requested a review from AndyAyersMS June 28, 2023 20:02
@AndyAyersMS
Copy link
Member

I assume you have also dug into some of the bigger diffs, eg x64 win asp.net's:

        1516 (76.26 % of base) : 96184.dasm - System.Reflection.MethodBase:CheckArguments(System.Span`1[System.Object],ulong,System.Span`1[ubyte],System.ReadOnlySpan`1[System.Object],System.RuntimeType[],System.Reflection.Binder,System.Globalization.CultureInfo,int):this (Tier1-OSR)

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I am really excited to see this enabled.

Seems like some of the analysis tooling you have built up could be very useful elsewhere too.

@jakobbotsch
Copy link
Member Author

I assume you have also dug into some of the bigger diffs, eg x64 win asp.net's:

Generally the size costs come from there being more to do in the prolog/colder blocks getting things into the right registers. For partially promoted structs block copies are also larger in size because they usually involve the full block copy that was there before, plus also writing/reading all the fields, and this can frequently be more costly in terms of code size than the improvements. Of course we also create many new locals with live ranges that has significant impact on LSRA, so in lots of cases there are different spill choices made too.

In this particular context physical promotion unlocks loop cloning, so we end up cloning a large loop, so code size is much worse while perf score is a bit better:

- Total bytes of code 1988, prolog size 110, PerfScore 85670.97, instruction count 411, allocated bytes for code 1988 (MethodHash=945c7d3a) for method System.Reflection.MethodBase:CheckArguments(System.Span`1[System.Object],ulong,System.Span`1[ubyte],System.ReadOnlySpan`1[System.Object],System.RuntimeType[],System.Reflection.Binder,System.Globalization.CultureInfo,int):this (Tier1-OSR)
+ Total bytes of code 3504, prolog size 110, PerfScore 76167.30, instruction count 742, allocated bytes for code 3504 (MethodHash=945c7d3a) for method System.Reflection.MethodBase:CheckArguments(System.Span`1[System.Object],ulong,System.Span`1[ubyte],System.ReadOnlySpan`1[System.Object],System.RuntimeType[],System.Reflection.Binder,System.Globalization.CultureInfo,int):this (Tier1-OSR)

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr gcstress-extra

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch jakobbotsch merged commit 9dcc7b1 into dotnet:main Jun 29, 2023
@jakobbotsch jakobbotsch deleted the enable-physical-promotion branch June 29, 2023 05:25
@jakobbotsch
Copy link
Member Author

Seems like some of the analysis tooling you have built up could be very useful elsewhere too.

The source is available in that repo, but it is of course quite tailored to the analysis I was doing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.