JIT: Make isRemovableJmpCandidate less restrictive on AMD64 #95493

amanasifkhalid · 2023-12-01T02:40:00Z

On AMD64, the emitter-level jump-to-next removal optimization is skipped if succeeding a call instruction, as we need an instruction after a call to support exception handling semantics. We already compute a more specific condition for this in the block-level optimization, and reusing that condition should allow us to do the emitter-level optimization in a few more places. The size improvements were small locally, but I think this should be done if the conditions are identical between the two optimizations.

ghost · 2023-12-01T02:40:16Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

On AMD64, the emitter-level jump-to-next removal optimization is skipped if succeeding a call instruction, as we need an instruction after a call to support exception handling semantics. We already compute a more specific condition for this in the block-level optimization, and reusing that condition should allow us to do the emitter-level optimization in a few more places. The size improvements were small locally, but I think this should be done if the conditions are identical between the two optimizations.

Author:	amanasifkhalid
Assignees:	amanasifkhalid
Labels:	`area-CodeGen-coreclr`
Milestone:	-

amanasifkhalid · 2023-12-01T16:14:58Z

@dotnet/jit-contrib PTAL. Diffs are small as expected.

BruceForstall

This change leads to removing a jump-to-next after a call that immediately precedes an epilog. This is not allowed, by the comments in this function (line 608) and the documentation in clr-abi.md: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md#amd64-padding-info

Note that it would be ok to convert these JMP to NOP.

(btw, if you're changing this file, can you change the reference to "X64 and ARM ABIs.docx" to be "clr-abi.md"?)

amanasifkhalid · 2023-12-04T19:02:21Z

@BruceForstall thank you for clarifying that; I'll update this change to insert a nop when needed.

(btw, if you're changing this file, can you change the reference to "X64 and ARM ABIs.docx" to be "clr-abi.md"?)

Sure thing.

amanasifkhalid · 2023-12-04T22:43:12Z

I've adjusted my implementation to always remove a useless jump succeeding a call instruction on AMD64 (assuming all the other optimization checks are met) and emit a nop instead. I added a new member, idjIsAfterCall, to instrDescJmp to indicate the jump will need to be replaced with a nop if removed. Note that this reduced the size of idjOffs on AMD64 to 28 bits instead of 29 -- I don't think this should be a problem?

Size improvements were bigger locally. I noticed the newly-introduced nop isn't showing up in JIT disasms, though the size decreases for the affected instruction groups suggest the nop was emitted. I'm guessing this is because my implementation does not create a new instrDesc for the nop.

amanasifkhalid · 2023-12-05T18:57:38Z

@BruceForstall PTAL. SuperPMI replay failure is the "Method is of low integrity" issue. Thank you!

Edit: Diffs.

BruceForstall

I see a few problems:

The NOP is not showing up in disassembly. This is a problem, because it is very confusing when looking at asm diffs or debugging code and comparing against JIT dump/disasm. We accept it for INS_align, but at least that displays something.
It appears that you are replacing all CALL + JMP-to-next with CALL+NOP? That's not required -- only when a CALL precedes an OS epilog.
Several places you use the constant "1" as a code size, because that is the size of an x64 NOP instruction. That needs to be well commented (if that code remains).
We already have emitOutputPreEpilogNOP(). I guess this doesn't work because it happens when we are generating code and go to generate an epilog. In the problematic case, the last instruction is a JMP, we call emitOutputPreEpilogNOP(), it doesn't see the last instruction as a CALL, and so it doesn't insert a NOP. Later on, the emitter::emitRemoveJumpToNextInst() phase comes along and removes the JMP. At that point, it's "too late" to insert the necessary NOP.

We might want to insert a provisional NOP after a CALL before a removable JMP. If a JMP is removed, and the next IG is an epilog, the NOP becomes non-provisional and should have actual size. Unfortunately, that's not great, because we'd also need to "zero out" these NOPs (set code size to zero), and there is no obvious place to do that.

Another (ugly?) alternative is to "bash" the JMP instrDesc with a NOP instrDesc when necessary. Then the NOP would be properly output. This might work if it's the very last instruction in an IG but is pretty gross. This is another implication of the fact we can't add/remove instrDescs once generated.

amanasifkhalid · 2023-12-06T03:35:03Z

@BruceForstall thank you for the review! I adjusted my implementation to only emit the nop in place of the removed jump if the jump succeeded a call and preceded an OS epilog; this yielded some nice size improvements locally. Also, for printing the newly-added nops in disasms, I've modified emitter::emitOutputInstr to temporarily change the removed jump's instruction description to that of a nop right before printing the disasm, and then change it back after to avoid triggering asserts related to the instrDesc size mismatch -- I'm guessing this is what you meant by "bashing" the jump instrDesc? I've added in comments to explain this process.

I think this implementation should cover the problematic case you mentioned with emitOutputPreEpilogNOP, right? Suppose there is a jmp between a call and an OS epilog. emitOutputPreEpilogNOP should see the jmp and emit nothing. Then in emitRemoveJumpToNextInst, should we decide to remove the jmp, we'll see it is after a call (because idjIsAfterCallBeforeEpilog was set to true in emitIns_J) and before an OS epilog (because we can check for the IGF_EPILOG flag in the jump's target IG), and thus mark it as needing a nop. Then in emitOutputInstr, we emit a nop in lieu of the jmp.

Edit: TP diffs are surprisingly big. This might have to do with the optimization being able to consider more jumps, but the previous run of runtime-coreclr superpmi-diffs didn't have TP regressions this big. I'll investigate tomorrow.

amanasifkhalid · 2023-12-06T15:44:16Z

I've determined the TP diffs are from the new logic for printing removed jumps as nops in disasms. I've tweaked this logic so that we only incur the TP hit if we're printing disasms, which doesn't seem to be a performance-critical path, anyway.

amanasifkhalid · 2023-12-06T18:52:55Z

TP diffs are better, but still a bit high in MinOpts. @BruceForstall do we have a threshold for how much TP this change is worth? I have a few ideas for improving TP a bit more, but the code is slightly less intuitive; I'll try it locally and see if it's worth a revision.

Edit: Scratch that, I found reverting my changes to emitJmpInstHasNoCode simplifies it and makes it marginally faster; I just had to move a few asserts around in emitOutputInstr. I don't expect a significant reduction in the TP regressions from this, though.

amanasifkhalid · 2023-12-06T21:50:19Z

I'm surprised TP worsened on Linux x64. I'll investigate in WSL.

amanasifkhalid · 2023-12-07T01:43:35Z

@BruceForstall sorry for all the pings -- TP diffs now peak at +0.1% on Linux x64. I found casting an instrDesc* to an instrDescJmp* and checking idjIsRemovableJmpCandidate is cheaper than checking if the instrDesc's code size is 0 on Linux, but indistinguishable on Windows. I think the rest of the TP impact comes mainly from the additional branches introduced, as well as the optimization now being able to consider a few more jumps. Are you ok with this TP hit? This seems to be a recurring theme in my recent PRs...

BruceForstall

LGTM. Just a few minor suggestions. The TP change is ok with me.

BruceForstall · 2023-12-07T04:45:36Z

src/coreclr/jit/emit.h

        // a candidate for being removed if it jumps to the next instruction
        unsigned idjIsRemovableJmpCandidate : 1;
+        // Indicates the jump succeeds a call instruction and precedes an OS epilog.


Can you use "follows" instead of "succeeds"?

BruceForstall · 2023-12-07T04:45:56Z

src/coreclr/jit/emit.h

 #else
            30;
-#endif
+#endif                            // !defined(TARGET_XARCH)


IMO, this comment is not actually helpful.

BruceForstall · 2023-12-07T04:46:24Z

src/coreclr/jit/emit.h

+        // Indicates the jump succeeds a call instruction and precedes an OS epilog.
+        // If this jump is removed, a nop will need to be emitted instead (see clr-abi.md for details).
+        unsigned idjIsAfterCallBeforeEpilog : 1;
+#elif defined(TARGET_XARCH)


You split off amd64, so the only "xarch" left is x86

Suggested change

#elif defined(TARGET_XARCH)

#elif defined(TARGET_X86)

BruceForstall · 2023-12-07T04:48:28Z

src/coreclr/jit/emit.h

+#if defined(TARGET_AMD64)
+            28;
+        // Indicates the jump was added at the end of a BBJ_ALWAYS basic block and is


fwiw, all this bit fiddling is silly on 64-bit, where we're going to have 32 bits of padding after the unsigned fields, in which case we should just move all the bits to the end and make them bool :1 fields. Maybe not for 32-bit, though.

But that shouldn't be done in this PR :)

Good point, I'll come back to this in a follow-up PR.

amanasifkhalid · 2023-12-07T15:36:44Z

Feedback changes are just comments and an ifdef. Failures from previous CI run were known NativeAOT failures.

ghost assigned amanasifkhalid Dec 1, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 1, 2023

Make isRemovableJmpCandidate less restrictive

1b45b83

amanasifkhalid force-pushed the remove-jmp branch from bcf5ed1 to 1b45b83 Compare December 1, 2023 02:41

amanasifkhalid marked this pull request as ready for review December 1, 2023 16:14

BruceForstall requested changes Dec 4, 2023

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 4, 2023

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 4, 2023

amanasifkhalid added 3 commits December 4, 2023 14:13

Update comment

08b69b0

Rename emitNop variable

d7e2ea3

Replace jump with nop after call on AMD64

8cd2847

BruceForstall requested changes Dec 6, 2023

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 6, 2023

amanasifkhalid added 3 commits December 5, 2023 21:46

Print nop in disasms

64c68b9

Emit nop only if before OS epilog

7e84c50

Style

354839f

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 6, 2023

Fix TP diff from nop print logic

f15b182

Fix condition around printing nops

e949e2c

Simplify emitJmpInstHasNoCode

26ca790

Improve Linux TP

964d728

build-analysis bot mentioned this pull request Dec 7, 2023

Widespread "Optimizing assemblies for size is not supported for the selected publish configuration" failures #95712

Closed

BruceForstall approved these changes Dec 7, 2023

View reviewed changes

Feedback

056ae1a

amanasifkhalid merged commit 303571a into dotnet:main Dec 7, 2023
119 of 139 checks passed

build-analysis bot mentioned this pull request Dec 7, 2023

Test infra failure on iOS.Device.Aot.Test.WorkItemExecution CI leg #92792

Closed

3 tasks

github-actions bot locked and limited conversation to collaborators Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Make isRemovableJmpCandidate less restrictive on AMD64 #95493

JIT: Make isRemovableJmpCandidate less restrictive on AMD64 #95493

amanasifkhalid commented Dec 1, 2023

ghost commented Dec 1, 2023

amanasifkhalid commented Dec 1, 2023

BruceForstall left a comment •

edited

Loading

amanasifkhalid commented Dec 4, 2023

amanasifkhalid commented Dec 4, 2023 •

edited

Loading

amanasifkhalid commented Dec 5, 2023 •

edited

Loading

BruceForstall left a comment

amanasifkhalid commented Dec 6, 2023 •

edited

Loading

amanasifkhalid commented Dec 6, 2023

amanasifkhalid commented Dec 6, 2023 •

edited

Loading

amanasifkhalid commented Dec 6, 2023

amanasifkhalid commented Dec 7, 2023 •

edited

Loading

BruceForstall left a comment

BruceForstall Dec 7, 2023

BruceForstall Dec 7, 2023

BruceForstall Dec 7, 2023

BruceForstall Dec 7, 2023

amanasifkhalid Dec 7, 2023

amanasifkhalid commented Dec 7, 2023

JIT: Make isRemovableJmpCandidate less restrictive on AMD64 #95493

JIT: Make isRemovableJmpCandidate less restrictive on AMD64 #95493

Conversation

amanasifkhalid commented Dec 1, 2023

ghost commented Dec 1, 2023

amanasifkhalid commented Dec 1, 2023

BruceForstall left a comment • edited Loading

Choose a reason for hiding this comment

amanasifkhalid commented Dec 4, 2023

amanasifkhalid commented Dec 4, 2023 • edited Loading

amanasifkhalid commented Dec 5, 2023 • edited Loading

BruceForstall left a comment

Choose a reason for hiding this comment

amanasifkhalid commented Dec 6, 2023 • edited Loading

amanasifkhalid commented Dec 6, 2023

amanasifkhalid commented Dec 6, 2023 • edited Loading

amanasifkhalid commented Dec 6, 2023

amanasifkhalid commented Dec 7, 2023 • edited Loading

BruceForstall left a comment

Choose a reason for hiding this comment

BruceForstall Dec 7, 2023

Choose a reason for hiding this comment

BruceForstall Dec 7, 2023

Choose a reason for hiding this comment

BruceForstall Dec 7, 2023

Choose a reason for hiding this comment

BruceForstall Dec 7, 2023

Choose a reason for hiding this comment

amanasifkhalid Dec 7, 2023

Choose a reason for hiding this comment

amanasifkhalid commented Dec 7, 2023

BruceForstall left a comment •

edited

Loading

amanasifkhalid commented Dec 4, 2023 •

edited

Loading

amanasifkhalid commented Dec 5, 2023 •

edited

Loading

amanasifkhalid commented Dec 6, 2023 •

edited

Loading

amanasifkhalid commented Dec 6, 2023 •

edited

Loading

amanasifkhalid commented Dec 7, 2023 •

edited

Loading