JIT: initial profile repair #100456

AndyAyersMS · 2024-03-29T23:06:36Z

Always run a repair pass after incorporating profile data. If that fails to converge, blend in synthetic likelihoods in increasing strengths to enable convergence (giving up some accuracy).

Enable block weight consistency checks to run right after profile incorporation. Disable just after that.

We can generally make profile data self-consistent, save for two exceptions:

if the flow graph contains an infinite (or effectively infinite) loop
if the IL is invalid

Because of (2) we can't immediately assert if the profile is not consistent; we must defer until after we've successfully imported the method (since the importer contains many of the validity checks).

Diffs

Always run a repair pass after incorporating profile data. If that fails to converge, blend in synthetic likelihoods in increasing strengths to enable convergence (giving up some accuracy). Enable block weight consistency checks to run right after profile incorporation. Disable just after that.

AndyAyersMS · 2024-03-29T23:06:53Z

@amanasifkhalid FYI

This may be a little too drastic....

AndyAyersMS · 2024-03-30T02:03:19Z

'System.Buffers.AhoCorasick:IndexOfAnyCore[System.Buffers.StringSearchValuesHelper+CaseSensitive,System.Buffers.AhoCorasick+NoFastScan](System.ReadOnlySpan`1[ushort]):int:this' during 'Profile incorporation' (IL size 388; hash 0x646e0c1f; Tier1)

This is the same method that inspired #100449.

AndyAyersMS · 2024-03-30T17:08:09Z

This shows we can have initially consistent profiles (with one exception) but perhaps more TP or CQ/CS impact than is necessary.

Because this method perturbs the profile there are several aspects to the TP impact: the cost of the computation, the effects of that profile on the rest of the JIT, and the possibility of additional missed contexts.

I will work up a version that does all the computation but then doesn't actually modify the profile, so I can get a better read on the first part. If the computation is relatively cheap it may make sense to up the solver iteration limit and/or slow down the blend factor growth "schedule" so the repairs are less impactful overall (in other words: perhaps by doing more computation early we can end up with less TP/CS/CQ churn).

Note we only expect to have to iterate if there are one or more "rare" factors:

improper (irreducible) loop headers
capped cyclic probabilities
infinite (or effectively infinite) loops (note these can never converge)
extremely large counts

So for the vast majority of methods we should do one pass through the code and be done. That's something I also should measure....

amanasifkhalid · 2024-04-01T14:28:30Z

This shows we can have initially consistent profiles (with one exception) but perhaps more TP or CQ/CS impact than is necessary.

Out of curiosity, are you planning on using the SPMI results to determine if the tradeoff between profile accuracy and profile consistency is worth it here? You mentioned you plan to try to reduce churn, so I might be premature in looking at the diffs, but I see some of the largest size changes are due to loop cloning, so those methods that regressed might be perf wins (and the size improvements might not be ideal?). As for the large number of small regressions that seem to be from block ordering, it doesn't seem easy to tell if splitting up some fall-through is an improvement because it allowed us to move a hot block up, or a regression because the newly-added jump isn't cold enough to pay for itself (though maybe spending a nontrivial amount of time on these diffs isn't worth it if we're going to change the block layout algorithm soon).

…taining improper loops

AndyAyersMS · 2024-04-02T03:01:10Z

Out of curiosity, are you planning on using the SPMI results to determine if the tradeoff between profile accuracy and profile consistency is worth it here?

Yes, to some extent... I think it might make sense to insist on initial consistency for now and later when we see how quickly consistency degrades (or is maintained) we might redo things. Maybe.

Synthesis and reconstruction assume exceptions are rare, so if the actual profile shows significant flow into a throw that can be caught, we won't model this properly. The net result is that some profile weight will vanish, throwing off the entry/exit balance. Tolerate this for now, by watching for cachable throws, and disabling the entry/exit balance checks if any are seen. If it ever turns out that serious code has high exception frequency we can reconsider and try and model flow through catches.

AndyAyersMS · 2024-04-03T17:00:21Z

/azp run runtime-coreclr libraries-pgo

azure-pipelines · 2024-04-03T17:00:42Z

Azure Pipelines successfully started running 1 pipeline(s).

AndyAyersMS · 2024-04-03T20:23:14Z

TP diffs are still kind of high, suspect this is from knock-on effects where the new profile leads to inlining or cloning changes.

Number of missed contexts is also up considerably, so the "tp win" on asp.net is likely misleading too.

Going to do some local runs where I just run the solver but don't update the counts, to see what that costs.

AndyAyersMS · 2024-04-03T22:57:22Z

Can't repro the arm64 failure so far.

Getting an isolated TP solver measurement seems tricky... still working on it.

AndyAyersMS · 2024-04-04T01:13:34Z

Upping iteration limit, hoping things converge on one of the earlier passes, so we do less blending overall, and so have something closer to the initial profile, so less TP impact and less churn. Seems like the solver is (relatively) cheap, though not sure I trust my local TP data.

Warp heuristic return likelihoods upwards each retry.

AndyAyersMS · 2024-04-04T15:20:52Z

Hmm, looks like a step in the wrong direction, which is odd... need to dig in.

Unless there is a catchable throw.::

AndyAyersMS · 2024-04-04T21:42:07Z

Hmm, looks like a step in the wrong direction, which is odd... need to dig in.

Looks like many of the new failures were entry/exit balance checks: that is we try and ensure that the total BBJ_RETURN and select BBJ_THROW weight matches the entry weight. Even if each node locally converges to 0.01 tolerance, a chain of these small mismatches can add up to a greater mismatch.

So we now check for the entry/exit residual during solving and keep iterating if it is too high.

AndyAyersMS · 2024-04-04T22:50:20Z

Upping iteration limit, hoping things converge on one of the earlier passes, so we do less blending overall, and so have something closer to the initial profile, so less TP impact and less churn. Seems like the solver is (relatively) cheap, though not sure I trust my local TP data.

Warp heuristic return likelihoods upwards each retry.

Idea that more initial iterations will improve TP and lead to less churn is not holding up so far...

AndyAyersMS · 2024-04-05T02:45:13Z

/azp run runtime-coreclr libraries-pgo

azure-pipelines · 2024-04-05T02:45:25Z

Azure Pipelines successfully started running 1 pipeline(s).

AndyAyersMS · 2024-04-05T15:21:53Z

Looks better, correctness wise... just one issue left perhaps for cases where the intitial profile is synthetic. TP/CQ still not where I'd like them to be, but I'm not sure they can be improved much.

AndyAyersMS · 2024-04-05T15:22:55Z

/azp run runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

azure-pipelines · 2024-04-05T15:23:21Z

Azure Pipelines successfully started running 3 pipeline(s).

…flow

AndyAyersMS · 2024-04-05T22:52:58Z

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

azure-pipelines · 2024-04-05T22:53:16Z

Azure Pipelines successfully started running 4 pipeline(s).

AndyAyersMS · 2024-04-06T19:45:56Z

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

azure-pipelines · 2024-04-06T19:46:19Z

Azure Pipelines successfully started running 4 pipeline(s).

AndyAyersMS · 2024-04-08T15:24:23Z

@amanasifkhalid PTAL
cc @dotnet/jit-contrib

Will have some TP impact and cause some CQ churn. Hoping to get most of the TP back if we can reign in cloning a bit.

amanasifkhalid

LGTM, assuming you get the results you want in CI.

Thanks for all the iterative work on this!

src/coreclr/jit/fgprofilesynthesis.cpp

AndyAyersMS · 2024-04-08T19:46:10Z

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

azure-pipelines · 2024-04-08T19:46:32Z

Azure Pipelines successfully started running 4 pipeline(s).

AndyAyersMS · 2024-04-08T19:58:54Z

The extra legs may not be 100% clean, though there was just one (related) failure in my last go round, so it's possible.

AndyAyersMS · 2024-04-09T00:31:01Z

libraries-pgo has a few crashes. I don't think they are directly related to this change, though perhaps it is making some existing issue more prominent. I will investigate them separately. There has been a low background rate of these things for a while now.

DrewScoggins · 2024-04-11T16:37:38Z

Regressions

Linux Arm64 Ampere: [Perf] Linux/arm64: 3 Regressions on 4/9/2024 5:02:41 AM #100923
Windows x64: [Perf] Windows/x64: 15 Regressions on 4/9/2024 12:49:16 AM perf-autofiling-issues#32734
Linux x64: [Perf] Linux/x64: 4 Regressions on 4/9/2024 12:49:16 AM perf-autofiling-issues#32722

Improvements

Linux Arm64 Ampere: [Perf] Linux/arm64: 12 Improvements on 4/9/2024 5:02:41 AM perf-autofiling-issues#32588
[Perf] Windows/arm64: 9 Improvements on 4/9/2024 5:02:41 AM perf-autofiling-issues#32578
[Perf] Windows/arm64: 8 Improvements on 4/9/2024 5:02:41 AM perf-autofiling-issues#32612
[Perf] Windows/x64: 36 Improvements on 4/9/2024 12:49:16 AM perf-autofiling-issues#32747
[Perf] Linux/x64: 2 Improvements on 4/9/2024 12:49:16 AM perf-autofiling-issues#32724

@AndyAyersMS

Always run a repair pass after incorporating profile data. If that fails to converge, blend in synthetic likelihoods in increasing strengths to enable convergence (giving up some accuracy). Enable block weight consistency checks to run right after profile incorporation. Disable just after that. We can generally make profile data self-consistent, save for two exceptions: 1. if the flow graph contains an infinite (or effectively infinite) loop 2. if the IL is invalid Because of (2) we can't immediately assert if the profile is not consistent; we must defer until after we've successfully imported the method (since the importer contains many of the validity checks).

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 29, 2024

dotnet-policy-service bot assigned AndyAyersMS Mar 29, 2024

AndyAyersMS changed the title ~~JIT: intial profild repair~~ JIT: intial profile repair Mar 30, 2024

build-analysis bot mentioned this pull request Mar 30, 2024

Failing test - System.Threading.ThreadPools.Tests.ThreadPoolTests.ThreadPoolMinMaxThreadsEventTest on Linux/arm64 #95873

Closed

AndyAyersMS changed the title ~~JIT: intial profile repair~~ JIT: initial profile repair Mar 30, 2024

allow infinite rel residual at any stage; better logging of loops con…

154c090

…taining improper loops

AndyAyersMS added 3 commits April 2, 2024 08:27

fix gcc build issue; elaborate on the iterative blend strategy

e7e702a

Merge branch 'main' into InitialProfileRepair

be1f21a

AndyAyersMS added 3 commits April 3, 2024 18:04

blend returns too; increase iteration limit

91a9397

Merge remote-tracking branch 'upstream/main' into InitialProfileRepair

3bb3af9

format

7bc8a6d

Solver should check entry/exit balance too.

42f6850

Unless there is a catchable throw.::

AndyAyersMS added 2 commits April 4, 2024 16:01

handle slight overflow in entry/exit residual

74160f0

fix build

229f7a2

back off on entry/exit checking a bit more; make sure to reset m_over…

d7a53af

…flow

build-analysis bot mentioned this pull request Apr 6, 2024

System.Text.RegularExpressions.Tests on Mono_Minijit_Debug-Ubuntu #100715

Closed

AndyAyersMS added 2 commits April 6, 2024 09:14

allow some inconsistency

04eb73f

tone down volume of stuff in jit dump a bit

5844821

fix deferred checking mechanism

3d0c9bf

AndyAyersMS marked this pull request as ready for review April 8, 2024 15:23

AndyAyersMS requested a review from amanasifkhalid April 8, 2024 15:24

fix build and a few comment typos

46b6a06

amanasifkhalid approved these changes Apr 8, 2024

View reviewed changes

review feedback

a89ba6f

AndyAyersMS merged commit f190dd2 into dotnet:main Apr 9, 2024
165 of 167 checks passed

AndyAyersMS mentioned this pull request Apr 9, 2024

JIT: Flow Graph Modernization and Improved Block Layout #93020

Closed

51 tasks

github-actions bot locked and limited conversation to collaborators May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: initial profile repair #100456

JIT: initial profile repair #100456

AndyAyersMS commented Mar 29, 2024 •

edited

Loading

AndyAyersMS commented Mar 29, 2024

AndyAyersMS commented Mar 30, 2024

AndyAyersMS commented Mar 30, 2024

amanasifkhalid commented Apr 1, 2024

AndyAyersMS commented Apr 2, 2024

AndyAyersMS commented Apr 3, 2024

azure-pipelines bot commented Apr 3, 2024

AndyAyersMS commented Apr 3, 2024

AndyAyersMS commented Apr 3, 2024

AndyAyersMS commented Apr 4, 2024 •

edited

Loading

AndyAyersMS commented Apr 4, 2024

AndyAyersMS commented Apr 4, 2024

AndyAyersMS commented Apr 4, 2024

AndyAyersMS commented Apr 5, 2024

azure-pipelines bot commented Apr 5, 2024

AndyAyersMS commented Apr 5, 2024

AndyAyersMS commented Apr 5, 2024

azure-pipelines bot commented Apr 5, 2024

AndyAyersMS commented Apr 5, 2024

azure-pipelines bot commented Apr 5, 2024

AndyAyersMS commented Apr 6, 2024

azure-pipelines bot commented Apr 6, 2024

AndyAyersMS commented Apr 8, 2024

amanasifkhalid left a comment

AndyAyersMS commented Apr 8, 2024

azure-pipelines bot commented Apr 8, 2024

AndyAyersMS commented Apr 8, 2024

AndyAyersMS commented Apr 9, 2024

DrewScoggins commented Apr 11, 2024 •

edited by AndyAyersMS

Loading

JIT: initial profile repair #100456

JIT: initial profile repair #100456

Conversation

AndyAyersMS commented Mar 29, 2024 • edited Loading

AndyAyersMS commented Mar 29, 2024

AndyAyersMS commented Mar 30, 2024

AndyAyersMS commented Mar 30, 2024

amanasifkhalid commented Apr 1, 2024

AndyAyersMS commented Apr 2, 2024

AndyAyersMS commented Apr 3, 2024

azure-pipelines bot commented Apr 3, 2024

AndyAyersMS commented Apr 3, 2024

AndyAyersMS commented Apr 3, 2024

AndyAyersMS commented Apr 4, 2024 • edited Loading

AndyAyersMS commented Apr 4, 2024

AndyAyersMS commented Apr 4, 2024

AndyAyersMS commented Apr 4, 2024

AndyAyersMS commented Apr 5, 2024

azure-pipelines bot commented Apr 5, 2024

AndyAyersMS commented Apr 5, 2024

AndyAyersMS commented Apr 5, 2024

azure-pipelines bot commented Apr 5, 2024

AndyAyersMS commented Apr 5, 2024

azure-pipelines bot commented Apr 5, 2024

AndyAyersMS commented Apr 6, 2024

azure-pipelines bot commented Apr 6, 2024

AndyAyersMS commented Apr 8, 2024

amanasifkhalid left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Apr 8, 2024

azure-pipelines bot commented Apr 8, 2024

AndyAyersMS commented Apr 8, 2024

AndyAyersMS commented Apr 9, 2024

DrewScoggins commented Apr 11, 2024 • edited by AndyAyersMS Loading

Regressions

Improvements

AndyAyersMS commented Mar 29, 2024 •

edited

Loading

AndyAyersMS commented Apr 4, 2024 •

edited

Loading

DrewScoggins commented Apr 11, 2024 •

edited by AndyAyersMS

Loading