Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Replace fgMoveHotJumps with 3-opt utility #112016

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Jan 30, 2025

Part of #107749. Based on the plans outlined in #111989 (comment), we want to remove phases that prematurely tweak the initial layout fed into 3-opt; fgMoveHotJumps is one such phase. However, initial attempts to remove it incurred large size increases on x86/x64, suggesting there was some utility in moving blocks closer to their hottest successors to keep the layout compact. To avoid derailing my consolidation plan, I've decided to refactor fgMoveHotJumps into a utility for 3-opt to use. For now, we will continue to use this pass to try to keep the layout compact. In the future, this functionality may be useful for churning the initial layout into 3-opt to discover new local-optimal layouts.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 30, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jan 31, 2025

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are particularly large on x86/x64, where we have variable-length jumps. A common theme I'm seeing in the diffs is where we have some straight-line code that can be entered via more than one predecessor, but the layout improvement for preferring one predecessor over the other is insignificant, so 3-opt now moves a large chunk of code up/down the method, potentially changing the sizes of the jumps inside the range. Such examples usually have large size diffs, but little/no PerfScore diff.

TP diffs are a wash; in spending less time on the initial layout, 3-opt has to do more work in some cases. I think I can get the cost of a 3-opt run down a bit in the next few PRs, so hopefully I'll make up for this... Thanks!

@AndyAyersMS
Copy link
Member

The diffs are indeed pretty large on xarch. It is hard to know if this is an improvement.

I wonder if it's worth trying to factor in jump distance as an extra cost factor for xarch (or maybe in general), to encourage compactness. For instance we could increase the cost of a jump by N% if there are N blocks between the source and target, or try and estimate each blocks size and sum that up and add a penalty factor when the jump span is over a threshold (like 128 bytes).

All that might be costly, since the cost delta of a swap now requires examining all blocks that have edges into or out of the swapped portions, and not just the blocks at the boundaries.

@amanasifkhalid
Copy link
Member Author

For instance we could increase the cost of a jump by N% if there are N blocks between the source and target, or try and estimate each blocks size and sum that up and add a penalty factor when the jump span is over a threshold (like 128 bytes).

I've found even nominal tweaks to our cost model to be quite expensive, so this kind of bookkeeping might be too costly. Since we don't seem to be gaining anything by removing fgMoveHotJumps at the moment, I'm ok with leaving it in for now (sorry for the premature review). I'll incorporate it into my refactor instead; if we decide to expand 3-opt to run on different initial layouts to find better local optima, then I think running 3-opt with and without fgMoveHotJumps run beforehand should be a reasonable starting point, since the diffs on this PR show 3-opt is quite sensitive to its effects on the initial layout.

@amanasifkhalid amanasifkhalid reopened this Feb 6, 2025
@amanasifkhalid amanasifkhalid changed the title JIT: Remove fgMoveHotJumps JIT: Replace fgMoveHotJumps with 3-opt utility Feb 6, 2025
@amanasifkhalid
Copy link
Member Author

@AndyAyersMS I rewrote fgMoveHotJumps to run during 3-opt. My goal wasn't to make the rewrite 1:1 in behavior, so because we're leveraging 3-opt's cost model to decide whether to shorten a jump, there are plenty of diffs. Across most collections, we seem to be doing a decent job of keeping code compact. benchmarks.run_pgo on linux-x64 looks like an outlier; from it's jit-analyze summary:

Top method regressions (percentages):
          33 (11.34 % of base) : 168132.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168300.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168368.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177064.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177184.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168440.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168564.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 176800.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 176896.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177620.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177728.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168184.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168208.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168452.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 168484.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 176788.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177216.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 176952.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177052.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)
          33 (11.34 % of base) : 177740.dasm - System.Runtime.InteropServices.SafeHandle:InternalRelease(ubyte):this (Tier1)

11% size increase sounds plausible from longer branches alone, and I suspect the number of duplicate methods in this collection is partly to blame. TP impact is <0.1% across most collections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants