-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do byref liveness updates for same register GPR moves #53684
Conversation
CC. @BruceForstall, @AndyAyersMS This passed all tests locally, but I imagine we will want to also run GC Stress tests if we want to take this. |
There are no assembly diffs for this change. However, there are textual changes to the dump: The -IN0005: mov rdx, qword ptr [rcx]
+IN0006: mov rdx, qword ptr [rcx]
; ...
-Added IP mapping: 0x001E STACK_EMPTY (G_M24931_IG02,ins#6,ofs#24) label
+Added IP mapping: 0x001E STACK_EMPTY (G_M24931_IG02,ins#7,ofs#24) label Carrying additional Allocations for System.Runtime.CompilerServices.CastHelpers:StelemRef(System.Array,int,System.Object) (MethodHash=1b5e9e9c)
-count: 1447, size: 112998, max = 3072
-allocateMemory: 131072, nraUsed: 118856
+count: 1451, size: 113262, max = 3072
+allocateMemory: 131072, nraUsed: 119120
; ...
- InstDesc | 4444 | 3.93%
+ InstDesc | 4700 | 4.15% Most importantly, the *************** In gcInfoBlockHdrSave()
Set code length to 482.
Set ReturnKind to Scalar.
Set Outgoing stack arg area size to 32.
Stack slot id for offset 32 (0x20) (sp) = 0.
Register slot id for reg r15 = 1.
Register slot id for reg rdx = 2.
Register slot id for reg rax = 3.
Register slot id for reg rcx = 4.
Register slot id for reg r15 (byref) = 5.
Register slot id for reg rbx (byref) = 6.
Set state of slot 0 at instr offset 0x65 to Live.
Set state of slot 0 at instr offset 0x105 to Dead.
Set state of slot 1 at instr offset 0x5d to Live.
Set state of slot 2 at instr offset 0x60 to Live.
Set state of slot 3 at instr offset 0x68 to Live.
Set state of slot 4 at instr offset 0x6f to Live.
Set state of slot 3 at instr offset 0x77 to Dead.
Set state of slot 4 at instr offset 0x77 to Dead.
Set state of slot 2 at instr offset 0x77 to Dead.
Set state of slot 1 at instr offset 0x77 to Dead.
Set state of slot 2 at instr offset 0xaa to Live.
Set state of slot 3 at instr offset 0xad to Live.
Set state of slot 2 at instr offset 0xb1 to Dead.
Set state of slot 4 at instr offset 0xb4 to Live.
Set state of slot 2 at instr offset 0xb7 to Live.
Set state of slot 3 at instr offset 0xbc to Dead.
Set state of slot 4 at instr offset 0xbc to Dead.
Set state of slot 2 at instr offset 0xbc to Dead.
+Set state of slot 5 at instr offset 0xf1 to Live.
Set state of slot 2 at instr offset 0xfe to Live.
Set state of slot 4 at instr offset 0x105 to Live.
Set state of slot 4 at instr offset 0x10a to Dead.
Set state of slot 2 at instr offset 0x10a to Dead.
-Set state of slot 5 at instr offset 0x10a to Live.
Set state of slot 5 at instr offset 0x119 to Dead.
+Set state of slot 6 at instr offset 0x155 to Live.
Set state of slot 2 at instr offset 0x16a to Live.
Set state of slot 4 at instr offset 0x171 to Live.
Set state of slot 4 at instr offset 0x176 to Dead.
Set state of slot 2 at instr offset 0x176 to Dead.
-Set state of slot 6 at instr offset 0x176 to Live.
Set state of slot 6 at instr offset 0x191 to Dead.
Set state of slot 2 at instr offset 0x1ab to Live.
Set state of slot 4 at instr offset 0x1bc to Live.
Set state of slot 4 at instr offset 0x1c1 to Dead.
Set state of slot 2 at instr offset 0x1c1 to Dead.
Defining interruptible range: [0x18, 0x133).
Defining interruptible range: [0x140, 0x1e2). |
Diffs via pmi or spmi? (I don't recall if SPMI diffing is sensitive to GC info changes, will have to look). |
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
I used Mainly to validate that there were no new instructions being emitted and rather only GC liveness updates with this change. |
Should also resolve #10821...? |
Yes, that's the same assert as from #51728, I'll update the top post to auto-close it
|
Looks like there is a baseline of ~8 failures in gcstress, so some failures here are pre-existing issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just one question/concern about possible appearances in the prolog.
@AndyAyersMS, looks like the only failures for GC Stress are known (and for ARM64 which isn't touched by this). This likely needs a second review before being merged, correct? |
Not sure it's required, but probably a good idea. @BruceForstall can you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions.
In the case where the byref goes live earlier, can you show the asm code example? I.e., does it make sense that it's going live earlier? Why did it go live later in the previous case?
You should be able to do superpmi asm diffs using superpmi.py asmdiffs --gcinfo
to see the effect there.
I'll get the spmi diff here, but in the cases I looked at already they were similar to the one I describe in the top post. Most typically its a In this case, the move may get elided and so the |
31da2fe
to
ff0ef6b
Compare
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
superpmi seems to be failing with |
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
superpmi.py has an algorithm for determining the baseline JIT to download from the JIT rolling build and use, but it's fragile as it depends on your branch being branched from a local 'main' branch, so it might be wrong, and in this case, it's probably picked a baseline JIT from before the last JIT/EE GUID change. You can build/download and specify your own baseline, using the |
You can alternatively use the |
When I do spmi asm diffs on this change, I see some interesting diffs, e.g.:
so a byref that is actually a constant (string?) gets loaded and stored to a byref stack location, then goes dead before the previous code brings it alive later. Not a hole, presumably, since the value is constant (the stack location is untracked, so no GC state change there).
A case where previously the byrefs didn't get marked alive until the label, and now are marked alive as soon as the register gets loaded. Seems like this would be an existing GC hole, but this is a marshalling IL stub, so maybe there's some magic going on here I don't understand. It looks to me like there is a bug where we now don't emit a
|
I'll look and see if I can find out what's causing this. Perhaps there is a bug with instruction groups or something else that is causing this to not be emitted. There are also outerloop jobs failing where they weren't before, those might have been introduced by a different PR however (looks to be failing since: https://dev.azure.com/dnceng/public/_build/results?buildId=1170978&view=results). |
This looks to be because we only emit if the last instruction was a Since we now track a |
e07c313
to
65012c8
Compare
32bdccf
to
1622b1d
Compare
1622b1d
to
91162ca
Compare
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
@BruceForstall, believe I'm back to zero diffs (modulo the GC liveness changes). I updated this to track the "last instruction" (the last |
Not quite zero diffs actually, seems I'm incorrectly tracking |
Now its at zero diffs, 🎉
benchmarks.run.windows.x64.checked.mch
coreclr_tests.pmi.windows.x64.checked.mch
libraries.crossgen.windows.x64.checked.mch
libraries.crossgen2.windows.x64.checked.mch
libraries.pmi.windows.x64.checked.mch
libraries_tests.pmi.windows.x64.checked.mch
|
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
As far as the GC diffs go (which are reported as textual diffs), there are quite a number. The ones I've gone through resemble those that @BruceForstall called out here: #53684 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this to zero (generated code) diffs.
// movs Rd,i8 T1_J0 00100dddiiiiiiii 2000 low imm(0-255) | ||
// mov{s} Rd,+i8<<i4 T2_L1 11110i00010S1111 0iiiddddiiiiiiii F04F 0000 imm(i8<<i4) | ||
// mov{s} Rd,Rm T2_C3 1110101001011111 0000dddd0000mmmm EA5F 0000 | ||
INST5(mov_eliminated, "mov_eliminated", 0, 0, IF_EN5A, 0x0000, 0x4600, 0x2000, 0xF04F0000, 0xEA5F0000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably the encodings could all be BAD_CODE
, and could even move this down to the INST1
section, but this is probably ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can log an issue to do this in a follow up. It would be good "low hanging fruit".
// mov Vd[],Rn DV_2C 01001110000iiiii 000111nnnnnddddd 4E00 1C00 Vd[],Rn (from general) | ||
// mov Vd,Vn[] DV_2E 01011110000iiiii 000001nnnnnddddd 5E00 0400 Vd,Vn[] (scalar by element) | ||
// mov Vd[],Vn[] DV_2F 01101110000iiiii 0jjjj1nnnnnddddd 6E00 0400 Vd[],Vn[] (from/to elem) | ||
INST9(mov_eliminated, "mov_eliminated", 0, IF_EN9, 0x2A0003E0, 0x11000000, 0x52800000, 0x320003E0, 0x0EA01C00, 0x0E003C00, 0x4E001C00, 0x5E000400, 0x6E000400) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as for instrsarm.h
@@ -77,6 +77,7 @@ INST4(xor, "xor", IUM_RW, 0x000030, 0x003080, | |||
INST4(cmp, "cmp", IUM_RD, 0x000038, 0x003880, 0x00003A, 0x00003C, INS_FLAGS_WritesAllFlags) | |||
INST4(test, "test", IUM_RD, 0x000084, 0x0000F6, 0x000084, 0x0000A8, INS_FLAGS_WritesAllFlags | INS_FLAGS_Resets_CF_OF_Flags) // AF = ? | |||
INST4(mov, "mov", IUM_WR, 0x000088, 0x0000C6, 0x00008A, 0x0000B0, INS_FLAGS_None) | |||
INST4(mov_eliminated, "mov_eliminated", IUM_WR, 0x000088, 0x0000C6, 0x00008A, 0x0000B0, INS_FLAGS_None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@BruceForstall any concerns with me setting this to auto-merge after I resolve the merge conflict here? |
Fine with me |
Hello @tannergooding! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
This resolves #51728 and resolves #10821 by ensuring byref liveness updates happen for same register moves.
A summary of the issue is that for certain operations, such as
new Span<T>(void* address)
orref T Unsafe.AsRef<T>(void* address)
, we'll have a pointer that is directly converted to aTYP_BYREF
. If the register allocator determines that the pointer is "last use", it may assign thebyref
result to the same register, at which point we may decide to elide the same register move. This was causing us to not track the byref becoming "live". While this is likely not problematic for the direct case ofPOINTER
toTYP_BYREF
, it might be problematic for multi-step conversion, such as:Where if
t
is given a register (sayedx
) and thenx
is given the same register (alsoedx
) the JIT would elide the move and not update the liveness and so the "last tracked reference" to the created array would go dead at the end of the fixed block. While in practice, it should be kept alive byx
since that's a byref and should be considered live.If taken, the same support should likely be added for Arm32/Arm64.