-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] X64 - Extend emitter peephole optimization of eliminating unnecessary mov
instructions
#79381
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionResolves #10315 based on the example code given in the first post. This will not eliminate all possible unnecessary The JIT's emitter already had an existing way of removing the instructions using: bool emitter::AreUpper32BitsZero(regNumber reg) But, it was only able to look back at one instruction. This PR extends // IMPORTANT: Only contains information from **before** the last emitted instruction.
// A lookup where each bit position corresponds to a register.
// A bit that is set means the register's upper 32-bits are zero.
// This effectively keeps track of which registers have their upper 32-bits set to zero.
// GPRs (general-purpose registers) only.
unsigned int upper32BitsZeroRegLookup; Example diffs: @@ -28,13 +28,9 @@ G_M17551_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000004 {rdx}, byref
movzx r8, byte ptr [rcx+01H]
movzx r9, byte ptr [rcx+02H]
movzx rcx, byte ptr [rcx+03H]
- mov eax, eax
movsx rax, byte ptr [rdx+rax]
- mov r8d, r8d
movsx r8, byte ptr [rdx+r8]
- mov r9d, r9d
movsx r9, byte ptr [rdx+r9]
- mov ecx, ecx
movsx rdx, byte ptr [rdx+rcx]
; byrRegs -[rdx]
shl eax, 18
@@ -43,12 +39,12 @@ G_M17551_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000004 {rdx}, byref
or eax, edx
or r8d, r9d
or eax, r8d
- ;; size=66 bbWeight=1 PerfScore 27.25
+ ;; size=56 bbWeight=1 PerfScore 26.25
G_M17551_IG03: ; , epilog, nogc, extend
ret
;; size=1 bbWeight=1 PerfScore 1.00
-; Total bytes of code 67, prolog size 0, PerfScore 34.95, instruction count 19, allocated bytes for code 67 (MethodHash=ea74bb70) for method System.Buffers.Text.Base64:Decode(ulong,byref):int
+; Total bytes of code 57, prolog size 0, PerfScore 32.95, instruction count 15, allocated bytes for code 57 (MethodHash=ea74bb70) for method System.Buffers.Text.Base64:Decode(ulong,byref):int Diffs from the issue's example: Diff SummaryDiffs are based on 1,398,805 contexts (351,415 MinOpts, 1,047,390 FullOpts). MISSED contexts: base: 20, diff: 20 Overall (-1,124 bytes)
FullOpts (-1,124 bytes)
DetailsImprovements/regressions per collection
Context information
jit-analyze outputAcceptance Criteria
|
{ | ||
assert(emitHasLastIns() == (emitLastInsIG != nullptr)); | ||
|
||
return emitHasLastIns() && // there is an emitLastInstr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is for a separate PR, but I think we need to prevent peephole optimizations when we're in the prolog or epilog. I.e.,
if (emitIGisInProlog(emitCurIG) || emitIGisInEpilog(emitCurIG))
{
return false;
}
#ifdef FEATURE_EH_FUNCLETS
if (emitIGisInFuncletProlog(emitCurIG) || emitIGisInFuncletEpilog(emitCurIG))
{
return false;
}
#endif
There is too much special handling in the prolog/epilog (e.g., unwinding) to allow peeps to kick in. There may be very specific cases where they are ok, but that requires some careful thinking.
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
/azp run jitstress |
No pipelines are associated with this pull request. |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
@dotnet/jit-contrib @BruceForstall This is ready again. I ran gcstress and jistress, they passed CI. The current failures are unrelated. |
Is there any way to avoid the TP impact in min opts? |
I believe it's a result of enabling backwards navigation in the insGroup/instrDesc (#80840), and maintaining those data structures. It's not obvious how this could only be done for non-MinOpts (and whether it would be advisable if it could be done). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Resolves #10315 based on the example code given in the first post.
This will not eliminate all possible unnecessary
mov
instructions, but it handles more now.The JIT's emitter already had an existing way of removing the instructions using:
But, it was only able to look back at one instruction.
This PR extends
AreUpper32BitsZero
to allow looking back up to 256 instructions(max limit of instructions for an IG).Example diffs:
Diffs from the issue's example:
Acceptance Criteria