-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Run profile repair after frontend phases #111915
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
With this change, I'm seeing LSRA introduce more branches from critical edge splitting, which I wasn't expecting. I'll dig into this |
Profile synthesis currently isn't all that diligent in retaining the method's entry weight; if we are retaining likelihoods, it will use |
Part of #107749. Prerequisite to #111915. Regardless of the profile synthesis option used, we ought to maintain the method's entry weight, which is computed by summing all non-flow weight into the entry block. Ideally, we'd use fgCalledCount here, but this isn't computed until after morph, and we need to tolerate the existence of multiple entry blocks for methods with OSR pre-morph.
With #111971 merged in, I'm not seeing profile repair overwrite cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs and TP impact show that profile repair needs to kick in pretty often in our PGO collections by the time we get to the backend. I'd expect that having profile repair use a profile-aware DFS tree wouldn't change behavior, and then we could reuse this throughout LSRA (and layout, if LSRA doesn't introduce new blocks); this might bring down the TP cost a little. Thanks! |
@AndyAyersMS no rush on this, but since I have a few high-churn layout PRs coming up, would you prefer to see the profile changes go in first, or should we get the 3-opt consolidation work in first? I have no preference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's plan on taking this one first.
There seems to be a big size improvement/regression imbalance in some (but not all) collections for benchmarks.run_pgo -- any idea why this is so prominent in linux x64 or windows x86 but not in windows x64?
Part of #107749. Introduce a repair phase that runs profile synthesis if the profile was marked inconsistent at some point, and call it at the end of the JIT frontend. LSRA and block layout are likely to benefit from profile consistency, and this is late enough in the JIT that the sources of diffs should be relatively obvious. We can save a bit of TP by precomputing a loop-aware DFS for profile repair to use, and then letting LSRA reuse this traversal to avoid redundant computations -- I'll save this for a follow-up PR.