Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: initial profile repair #100456

Merged
merged 17 commits into from
Apr 9, 2024
Merged

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Mar 29, 2024

Always run a repair pass after incorporating profile data. If that fails to converge, blend in synthetic likelihoods in increasing strengths to enable convergence (giving up some accuracy).

Enable block weight consistency checks to run right after profile incorporation. Disable just after that.

We can generally make profile data self-consistent, save for two exceptions:

  1. if the flow graph contains an infinite (or effectively infinite) loop
  2. if the IL is invalid

Because of (2) we can't immediately assert if the profile is not consistent; we must defer until after we've successfully imported the method (since the importer contains many of the validity checks).

Diffs

Always run a repair pass after incorporating profile data. If that fails
to converge, blend in synthetic likelihoods in increasing strengths to
enable convergence (giving up some accuracy).

Enable block weight consistency checks to run right after profile incorporation.
Disable just after that.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 29, 2024
@AndyAyersMS
Copy link
Member Author

@amanasifkhalid FYI

This may be a little too drastic....

@AndyAyersMS
Copy link
Member Author

'System.Buffers.AhoCorasick:IndexOfAnyCore[System.Buffers.StringSearchValuesHelper+CaseSensitive,System.Buffers.AhoCorasick+NoFastScan](System.ReadOnlySpan`1[ushort]):int:this' during 'Profile incorporation' (IL size 388; hash 0x646e0c1f; Tier1)

This is the same method that inspired #100449.

@AndyAyersMS AndyAyersMS changed the title JIT: intial profile repair JIT: initial profile repair Mar 30, 2024
@AndyAyersMS
Copy link
Member Author

This shows we can have initially consistent profiles (with one exception) but perhaps more TP or CQ/CS impact than is necessary.

Because this method perturbs the profile there are several aspects to the TP impact: the cost of the computation, the effects of that profile on the rest of the JIT, and the possibility of additional missed contexts.

I will work up a version that does all the computation but then doesn't actually modify the profile, so I can get a better read on the first part. If the computation is relatively cheap it may make sense to up the solver iteration limit and/or slow down the blend factor growth "schedule" so the repairs are less impactful overall (in other words: perhaps by doing more computation early we can end up with less TP/CS/CQ churn).

Note we only expect to have to iterate if there are one or more "rare" factors:

  • improper (irreducible) loop headers
  • capped cyclic probabilities
  • infinite (or effectively infinite) loops (note these can never converge)
  • extremely large counts

So for the vast majority of methods we should do one pass through the code and be done. That's something I also should measure....

@amanasifkhalid
Copy link
Member

This shows we can have initially consistent profiles (with one exception) but perhaps more TP or CQ/CS impact than is necessary.

Out of curiosity, are you planning on using the SPMI results to determine if the tradeoff between profile accuracy and profile consistency is worth it here? You mentioned you plan to try to reduce churn, so I might be premature in looking at the diffs, but I see some of the largest size changes are due to loop cloning, so those methods that regressed might be perf wins (and the size improvements might not be ideal?). As for the large number of small regressions that seem to be from block ordering, it doesn't seem easy to tell if splitting up some fall-through is an improvement because it allowed us to move a hot block up, or a regression because the newly-added jump isn't cold enough to pay for itself (though maybe spending a nontrivial amount of time on these diffs isn't worth it if we're going to change the block layout algorithm soon).

@AndyAyersMS
Copy link
Member Author

Out of curiosity, are you planning on using the SPMI results to determine if the tradeoff between profile accuracy and profile consistency is worth it here?

Yes, to some extent... I think it might make sense to insist on initial consistency for now and later when we see how quickly consistency degrades (or is maintained) we might redo things. Maybe.

Synthesis and reconstruction assume exceptions are rare, so if the actual
profile shows significant flow into a throw that can be caught, we won't
model this properly. The net result is that some profile weight will
vanish, throwing off the entry/exit balance.

Tolerate this for now, by watching for cachable throws, and disabling the
entry/exit balance checks if any are seen. If it ever turns out that serious
code has high exception frequency we can reconsider and try and model flow
through catches.
@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

TP diffs are still kind of high, suspect this is from knock-on effects where the new profile leads to inlining or cloning changes.

image

Number of missed contexts is also up considerably, so the "tp win" on asp.net is likely misleading too.

Going to do some local runs where I just run the solver but don't update the counts, to see what that costs.

@AndyAyersMS
Copy link
Member Author

Can't repro the arm64 failure so far.

Getting an isolated TP solver measurement seems tricky... still working on it.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Apr 4, 2024

Upping iteration limit, hoping things converge on one of the earlier passes, so we do less blending overall, and so have something closer to the initial profile, so less TP impact and less churn. Seems like the solver is (relatively) cheap, though not sure I trust my local TP data.

Warp heuristic return likelihoods upwards each retry.

@AndyAyersMS
Copy link
Member Author

Hmm, looks like a step in the wrong direction, which is odd... need to dig in.

Unless there is a catchable throw.::
@AndyAyersMS
Copy link
Member Author

Hmm, looks like a step in the wrong direction, which is odd... need to dig in.

Looks like many of the new failures were entry/exit balance checks: that is we try and ensure that the total BBJ_RETURN and select BBJ_THROW weight matches the entry weight. Even if each node locally converges to 0.01 tolerance, a chain of these small mismatches can add up to a greater mismatch.

So we now check for the entry/exit residual during solving and keep iterating if it is too high.

@AndyAyersMS
Copy link
Member Author

Upping iteration limit, hoping things converge on one of the earlier passes, so we do less blending overall, and so have something closer to the initial profile, so less TP impact and less churn. Seems like the solver is (relatively) cheap, though not sure I trust my local TP data.

Warp heuristic return likelihoods upwards each retry.

Idea that more initial iterations will improve TP and lead to less churn is not holding up so far...

image

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

Looks better, correctness wise... just one issue left perhaps for cases where the intitial profile is synthetic. TP/CQ still not where I'd like them to be, but I'm not sure they can be improved much.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@AndyAyersMS AndyAyersMS marked this pull request as ready for review April 8, 2024 15:23
@AndyAyersMS
Copy link
Member Author

@amanasifkhalid PTAL
cc @dotnet/jit-contrib

Will have some TP impact and cause some CQ churn. Hoping to get most of the TP back if we can reign in cloning a bit.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming you get the results you want in CI.

Thanks for all the iterative work on this!

src/coreclr/jit/fgprofilesynthesis.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/fgprofilesynthesis.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/fgprofilesynthesis.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/fgprofilesynthesis.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/fgprofilesynthesis.cpp Outdated Show resolved Hide resolved
@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstress, runtime-coreclr pgostress, runtime-coreclr pgo

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@AndyAyersMS
Copy link
Member Author

The extra legs may not be 100% clean, though there was just one (related) failure in my last go round, so it's possible.

@AndyAyersMS
Copy link
Member Author

libraries-pgo has a few crashes. I don't think they are directly related to this change, though perhaps it is making some existing issue more prominent. I will investigate them separately. There has been a low background rate of these things for a while now.

@AndyAyersMS AndyAyersMS merged commit f190dd2 into dotnet:main Apr 9, 2024
165 of 167 checks passed
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
Always run a repair pass after incorporating profile data. If that fails to converge, blend in synthetic likelihoods in increasing strengths to enable convergence (giving up some accuracy).

Enable block weight consistency checks to run right after profile incorporation. Disable just after that.

We can generally make profile data self-consistent, save for two exceptions:
1. if the flow graph contains an infinite (or effectively infinite) loop
2. if the IL is invalid

Because of (2) we can't immediately assert if the profile is not consistent; we must defer until after we've successfully imported the method (since the importer contains many of the validity checks).
@github-actions github-actions bot locked and limited conversation to collaborators May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants