-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[RyuJIT/armel] Make RefPosition arg regs fixed #13023
Conversation
cc @dotnet/arm32-contrib |
src/jit/regset.cpp
Outdated
@@ -1604,7 +1604,7 @@ void RegSet::rsSpillTree(regNumber reg, GenTreePtr tree, unsigned regIdx /* =0 * | |||
#endif // _TARGET_ARM_ | |||
else | |||
{ | |||
assert(!varTypeIsMultiReg(tree)); | |||
assert(!varTypeIsMultiReg(tree) || (m_rsCompiler->opts.compUseSoftFP && treeType == TYP_LONG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. How does it know which have of the double/long is being spilled? I think it needs to be handled similarly to the other multiReg ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. Would that mean GenTreePutArgReg
(for armel) should have the field gtSpillFlags
and corresponding setter/getters just like GenTreeCall
and GenTreePutArgSplit
do? (Since PutArgReg is MultiRegNode for double type args on armel)
useNode->gtLsraInfo.setSrcCandidates(this, candidates & ~candidate); | ||
candidates = candidate; | ||
} | ||
#endif // ARM_SOFTFP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should more generally handle the case of useNode->IsMultiRegNode()
. And it might be better to extract this into a separate method. I'm not crazy about modifying the srcCandidates
on useNode
, but there is probably not a better way that doesn't suffer from inefficiency.
@CarolEidt I updated the commit so each reg now gets own SpillFlag for GenTreeMultiRegOp(just like MultiRegCall and PutArgSplit). And it makes this test case pass on Release mode. However I get an assert failure on Debug.
It looks like RyuJIT does not allow adding temps while codegen, it pre-allocates temps before codegen which probably means that currently it does not correctly calculates how many temps we need. Could you give me some clue on this? cc: @hseok-oh |
FYI the register allocation looks unnatural. The below is pushing arguments for a call to a static method with two double arguments. However on RP#2889,
|
@wateret :
You're right. You have to declare amount of used register in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable approach overall, though I filed issue #13183 to unify the handling of multi-reg nodes.
However, I don't understand the arbitrary increment of the spill temp count.
} | ||
|
||
unspillTree->gtFlags &= ~GTF_SPILLED; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should merge these cases, and unify the handling of all the different multi-reg ops on the various targets. I'll open a separate issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Definitely I agree with that.
src/jit/lsra.cpp
Outdated
if (compiler->opts.compUseSoftFP) | ||
{ | ||
JITDUMP("Adding a spill temp for moving target address to a register.\n"); | ||
maxSpill[TYP_INT] += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Where is this spill temp being used, what is it used for, and how is it obtained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt After I made the second commit(I thought it would be enough), but I got the assertion mentioned from #13023 (comment). So I had to fix it, and the code here is sort of a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I stated here #13023 (comment) was strange part while LSRA, the C# code of this part is roughly like:
func(a, b);
// static SomeReturnType func(double, double)
While setting r0-r3 arguments, they are not fixed until the call. So r0
is spilled (since no other free registers) for call address. Eventually the call address goes to r8
though. And this probably requires one more temp variable.
I think this could be an issue for general arm, not only armel, since x86/x64 usually does not store call address to a register, directly uses a full immediate address.
% The exact TC I was looking at is JIT/Methodical/NaN/r8NaNmul_cs_do/r8NaNmul_cs_do.exe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually the call address goes to r8 though. And this probably requires one more temp variable.
There should be a spill temp accounted for with the spill address of the target register. Given the dump segment that you shared, it would seem that:
- The instruction at N1861 defines a value (I981) into r0. It is spilled, so that will result in
maxSpill[TYP_INT]
being set to at least 1 (it would be more if there are other outstanding spills, but that's not 100% clear from the dump fragment). - The instruction at N1863 defines the call target address (and the definition is located at 1864). It is also spilled, so at this point
maxSpill[TYP_INT]
should be at least 2. - It appears that the call is at N1865. It reloads I981 into r0, which causes
maxSpill[TYP_INT]
to be decremented. - It then reloads the target address (C983) into r8, which causes
maxSpill[TYP_INT]
to be decremented again - back to 0 or whatever it was at Location 1861. - Then, it spills r12 (V7), which requires an additional spill temp, so
maxSpill[TYP_INT]
should now be at least one again.
During codegen, these spills and reloads should be happening in this same order. Can you verify that's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Looks like it does.
Generating: N1851 ( 47, 33) [000885] ---X--------- t885 = * IND double REG f8 <l:$350, c:$6db>
Byref regs: 0008 {r3} => 0000 {}
IN037d: vldr d8, [r3]
/--* t885 double
Generating: N1853 (???,???) [014273] ---X--------- t14273 = * COPY long REG r2,r3
/--* t14273 long
Generating: N1855 (???,???) [014274] ---X--------- t14274 = * PUTARG_REG long REG r2,r3
IN037e: vmov.d2i r2, r3, d8
Generating: N1857 ( 1, 1) [000872] ------------- t872 = CNS_DBL double 0.0000000000000000 REG f8 $80
IN037f: movs r0, 0
IN0380: movs r1, 0
IN0381: vmov.i2d d8, r0, r1
/--* t872 double
Generating: N1859 (???,???) [014275] ------------- t14275 = * COPY long REG r0,r1
/--* t14275 long
Generating: N1861 (???,???) [014276] ------------Z t14276 = * PUTARG_REG long REG r0,r1
IN0382: vmov.d2i r0, r1, d8
reused temp #2, slot 0, size = 4
The register r0 spilled with [014276]
IN0383: str r0, [sp+0x10] // [TEMP_02]
Generating: N1863 ( 3, 7) [014277] ------------Z t14277 = CNS_INT(h) int 0xB642FE33 ftn REG r0
IN0384: movw r0, 0xfe33
IN0385: movt r0, 0xb642
reused temp #1, slot 0, size = 4
The register r0 spilled with [014277]
IN0386: str r0, [sp+0x0c] // [TEMP_01]
/--* t14277 int
Generating: [019739] ------------- t19739 = * RELOAD int REG r8
/--* t14274 long arg1 r2,r3
+--* t14276 long arg0 r0,r1
+--* t19739 int control expr
Generating: N1865 ( 64, 46) [000886] --CX--------- t886 = * CALL help double HELPER.CORINFO_HELP_DBLREM <l:$322, c:$323>
Tree-Node marked unspilled from [014276]
IN0387: ldr r0, [sp+0x10] // [TEMP_02]
release temp #2, slot 0, size = 4
Tree-Node marked unspilled from [014277]
IN0388: ldr r8, [sp+0x0c] // [TEMP_01]
release temp #1, slot 0, size = 4
With allocating 1 more temp for this function it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this cannot be a solution since it will allocate one more temps for all methods.
@CarolEidt Previously the temps are:
TEMP_02 is unused and we need one more int temp. When it is spilled, since the type of PUTARG_REG is TYP_LONG, a wrong type of temp was generated. I think we need to treat them as two separated INTs.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a suggested #ifdef change and some formatting fixes, I think this will be ready to merge; I think the arm failures may be due to #13186?
src/jit/lsra.cpp
Outdated
else if (treeNode->OperIsPutArgReg()) | ||
{ | ||
// For double arg regs, the type is changed to long since they must be passed via `r0-r3`. | ||
// However when they get spilled, they should be treated as separated int registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is just the case for ARM_SOFTFP
, right? In other cases, an actual TYP_LONG
would have been decomposed, so it would be better to put this under #ifdef ARM_SOFTFP
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Right, I'll fix it.
Passing double arg regs via `r0:r1` or `r2:r3`, each RefPosition's RegMask should have only one exact register so the candidate register can be fixed. e.g.) ``` <RefPosition dotnet#4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0-r1] last> <RefPosition dotnet#4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r0-r1] last> ``` to be: ``` <RefPosition dotnet#4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0] last fixed> <RefPosition dotnet#4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r1] last fixed> ``` Fix #12994
@CarolEidt With a fix for your comment and rebasing, it is ready to get merged. Thanks! |
Fix regression caused by dotnet#13023 Fix #13281
Passing double arg regs via
r0:r1
orr2:r3
,each RefPosition's RegMask should have only one exact register
so the candidate register can be fixed.
e.g.)
to be:
Fix #12994