[RyuJIT/armel] Make RefPosition arg regs fixed #13023

wateret · 2017-07-25T12:13:54Z

Passing double arg regs via r0:r1 or r2:r3,
each RefPosition's RegMask should have only one exact register
so the candidate register can be fixed.

e.g.)

<RefPosition #4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0-r1] last>
<RefPosition #4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r0-r1] last>

to be:

<RefPosition #4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0] last fixed>
<RefPosition #4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r1] last fixed>

Fix #12994

BruceForstall · 2017-07-25T17:58:12Z

cc @dotnet/arm32-contrib

CarolEidt · 2017-07-25T19:07:31Z

src/jit/regset.cpp

@@ -1604,7 +1604,7 @@ void RegSet::rsSpillTree(regNumber reg, GenTreePtr tree, unsigned regIdx /* =0 *
 #endif // _TARGET_ARM_
    else
    {
-        assert(!varTypeIsMultiReg(tree));
+        assert(!varTypeIsMultiReg(tree) || (m_rsCompiler->opts.compUseSoftFP && treeType == TYP_LONG));


I don't think this is correct. How does it know which have of the double/long is being spilled? I think it needs to be handled similarly to the other multiReg ops.

Thank you for your comment. Would that mean GenTreePutArgReg(for armel) should have the field gtSpillFlags and corresponding setter/getters just like GenTreeCall and GenTreePutArgSplit do? (Since PutArgReg is MultiRegNode for double type args on armel)

CarolEidt · 2017-07-25T19:12:20Z

src/jit/lsra.cpp

+                useNode->gtLsraInfo.setSrcCandidates(this, candidates & ~candidate);
+                candidates = candidate;
+            }
+#endif // ARM_SOFTFP


I think this should more generally handle the case of useNode->IsMultiRegNode(). And it might be better to extract this into a separate method. I'm not crazy about modifying the srcCandidates on useNode, but there is probably not a better way that doesn't suffer from inefficiency.

wateret · 2017-08-01T02:58:30Z

@CarolEidt I updated the commit so each reg now gets own SpillFlag for GenTreeMultiRegOp(just like MultiRegCall and PutArgSplit). And it makes this test case pass on Release mode. However I get an assert failure on Debug.

Assert failure(PID 19407 [0x00004bcf], Thread: 19407 [0x4bcf]): Assertion failed 'temp != nullptr' in 'r8NaNrem:Main():int' (IL size 8610)

    File: /opt/code/work/src/jit/regset.cpp Line: 3313
    Image: /opt/usr/home/owner/dotnet/ttt/corerun

crashed [1471040486] processname=corerun, pid=19407, tid=19407, signal=6

It looks like RyuJIT does not allow adding temps while codegen, it pre-allocates temps before codegen which probably means that currently it does not correctly calculates how many temps we need. Could you give me some clue on this?

cc: @hseok-oh

wateret · 2017-08-01T03:07:22Z

FYI the register allocation looks unnatural. The below is pushing arguments for a call to a static method with two double arguments. However on RP#2889, r0 is spilled by the target address then it is spilled on RP#2895 by the argument again, then it gets proper register r8 on RP#2898. That is, the first two allocation was unnecessary. Maybe this is related to what I stated above.

Allocating Registers

...

 1860.#2881  I979   Def    Alloc r0   |I979 a|      |I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1860.#2882  I980   Def    Alloc r1   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2883  r0     Fixd   Keep  r0   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2884  I979   Use *  Keep  r0   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2885  r1     Fixd   Keep  r1   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2886  I980   Use *  Keep  r1   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1862.#2887  I981   Def    Alloc r0   |I981 a|      |I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
--------------------------------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------
Loc   RP#    Name   Type  Action Reg  |r0    |r1    |r2    |r3    |r4    |r5    |r6    |r7    |r8    |r9    |r10   |r11   |r12   |sp    |lr
--------------------------------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------
 1862.#2888  I982   Def    Alloc r1   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1864.#2889  C983   Def    Spill r0   |      |I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Steal r0   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2890  r2     Fixd   Keep  r2   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2891  I974   Use *  Keep  r2   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2892  r3     Fixd   Keep  r3   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2893  I975   Use *  Keep  r3   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2894  r0     Fixd   Keep  r0   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2895  I981   Use *  ReLod NA   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Spill r0   |      |I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Steal r0   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2896  r1     Fixd   Keep  r1   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2897  I982   Use *  Keep  r1   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2898  C983   Use *  ReLod NA   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Spill r8   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|      |V6   a|----  |----  |V7   a|----  |V21  a
                           Steal r8   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|C983 a|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2899  r0     Kill   Keep  r0   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2900  r1     Kill   Keep  r1   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2901  r2     Kill   Keep  r2   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2902  r3     Kill   Keep  r3   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2903  r12    Kill   Spill r12  |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |      |----  |V21  a

...

alpencolt · 2017-08-01T10:13:24Z

@wateret :

It looks like RyuJIT does not allow adding temps while codegen, it pre-allocates temps before codegen which probably means that currently it does not correctly calculates how many temps we need. Could you give me some clue on this?

You're right. You have to declare amount of used register in lsra and then use them in codegen not more than you've requested. Another point you should remember is that destination register can be one from temporary set registers (I mean register allocator can provide for source register r2, for temporary r1 and for destination again r1). To avoid it set isInternalRegDelayFree=true.

CarolEidt

This seems like a reasonable approach overall, though I filed issue #13183 to unify the handling of multi-reg nodes.
However, I don't understand the arbitrary increment of the spill temp count.

CarolEidt · 2017-08-02T23:49:32Z

src/jit/codegenlinear.cpp

+            }
+
+            unspillTree->gtFlags &= ~GTF_SPILLED;
+        }


I think that we should merge these cases, and unify the handling of all the different multi-reg ops on the various targets. I'll open a separate issue for that.

@CarolEidt Definitely I agree with that.

CarolEidt · 2017-08-03T00:05:39Z

src/jit/lsra.cpp

+    if (compiler->opts.compUseSoftFP)
+    {
+        JITDUMP("Adding a spill temp for moving target address to a register.\n");
+        maxSpill[TYP_INT] += 1;


Why is this needed? Where is this spill temp being used, what is it used for, and how is it obtained?

@CarolEidt After I made the second commit(I thought it would be enough), but I got the assertion mentioned from #13023 (comment). So I had to fix it, and the code here is sort of a workaround.

What I stated here #13023 (comment) was strange part while LSRA, the C# code of this part is roughly like:

func(a, b); // static SomeReturnType func(double, double)

While setting r0-r3 arguments, they are not fixed until the call. So r0 is spilled (since no other free registers) for call address. Eventually the call address goes to r8 though. And this probably requires one more temp variable.
I think this could be an issue for general arm, not only armel, since x86/x64 usually does not store call address to a register, directly uses a full immediate address.

% The exact TC I was looking at is JIT/Methodical/NaN/r8NaNmul_cs_do/r8NaNmul_cs_do.exe.

Eventually the call address goes to r8 though. And this probably requires one more temp variable.

There should be a spill temp accounted for with the spill address of the target register. Given the dump segment that you shared, it would seem that:

The instruction at N1861 defines a value (I981) into r0. It is spilled, so that will result in maxSpill[TYP_INT] being set to at least 1 (it would be more if there are other outstanding spills, but that's not 100% clear from the dump fragment).

The instruction at N1863 defines the call target address (and the definition is located at 1864). It is also spilled, so at this point maxSpill[TYP_INT] should be at least 2.

It appears that the call is at N1865. It reloads I981 into r0, which causes maxSpill[TYP_INT] to be decremented.

It then reloads the target address (C983) into r8, which causes maxSpill[TYP_INT] to be decremented again - back to 0 or whatever it was at Location 1861.

Then, it spills r12 (V7), which requires an additional spill temp, so maxSpill[TYP_INT] should now be at least one again.

During codegen, these spills and reloads should be happening in this same order. Can you verify that's the case?

@CarolEidt Looks like it does.

Generating: N1851 ( 47, 33) [000885] ---X--------- t885 = * IND double REG f8 <l:$350, c:$6db> Byref regs: 0008 {r3} => 0000 {} IN037d: vldr d8, [r3] /--* t885 double Generating: N1853 (???,???) [014273] ---X--------- t14273 = * COPY long REG r2,r3 /--* t14273 long Generating: N1855 (???,???) [014274] ---X--------- t14274 = * PUTARG_REG long REG r2,r3 IN037e: vmov.d2i r2, r3, d8 Generating: N1857 ( 1, 1) [000872] ------------- t872 = CNS_DBL double 0.0000000000000000 REG f8 $80 IN037f: movs r0, 0 IN0380: movs r1, 0 IN0381: vmov.i2d d8, r0, r1 /--* t872 double Generating: N1859 (???,???) [014275] ------------- t14275 = * COPY long REG r0,r1 /--* t14275 long Generating: N1861 (???,???) [014276] ------------Z t14276 = * PUTARG_REG long REG r0,r1 IN0382: vmov.d2i r0, r1, d8 reused temp #2, slot 0, size = 4 The register r0 spilled with [014276] IN0383: str r0, [sp+0x10] // [TEMP_02] Generating: N1863 ( 3, 7) [014277] ------------Z t14277 = CNS_INT(h) int 0xB642FE33 ftn REG r0 IN0384: movw r0, 0xfe33 IN0385: movt r0, 0xb642 reused temp #1, slot 0, size = 4 The register r0 spilled with [014277] IN0386: str r0, [sp+0x0c] // [TEMP_01] /--* t14277 int Generating: [019739] ------------- t19739 = * RELOAD int REG r8 /--* t14274 long arg1 r2,r3 +--* t14276 long arg0 r0,r1 +--* t19739 int control expr Generating: N1865 ( 64, 46) [000886] --CX--------- t886 = * CALL help double HELPER.CORINFO_HELP_DBLREM <l:$322, c:$323> Tree-Node marked unspilled from [014276] IN0387: ldr r0, [sp+0x10] // [TEMP_02] release temp #2, slot 0, size = 4 Tree-Node marked unspilled from [014277] IN0388: ldr r8, [sp+0x0c] // [TEMP_01] release temp #1, slot 0, size = 4

With allocating 1 more temp for this function it works.

Of course, this cannot be a solution since it will allocate one more temps for all methods.

wateret · 2017-08-04T08:36:59Z

@CarolEidt
I reverted the previous commit(a4f9b3a), and made change to LinearScan::updateMaxSpill().

Previously the temps are:

;  TEMP_01                                     int  ->  [r11-0x110]
;  TEMP_02                                    long  ->  [r11-0x118]

TEMP_02 is unused and we need one more int temp. When it is spilled, since the type of PUTARG_REG is TYP_LONG, a wrong type of temp was generated. I think we need to treat them as two separated INTs.
And the new commit now generates temps correctly.

;  TEMP_02                                     int  ->  [r11-0x110]
;  TEMP_01                                     int  ->  [r11-0x114]

CarolEidt

With a suggested #ifdef change and some formatting fixes, I think this will be ready to merge; I think the arm failures may be due to #13186?

CarolEidt · 2017-08-04T16:28:01Z

src/jit/lsra.cpp

+                else if (treeNode->OperIsPutArgReg())
+                {
+                    // For double arg regs, the type is changed to long since they must be passed via `r0-r3`.
+                    // However when they get spilled, they should be treated as separated int registers.


I think that this is just the case for ARM_SOFTFP, right? In other cases, an actual TYP_LONG would have been decomposed, so it would be better to put this under #ifdef ARM_SOFTFP.

@CarolEidt Right, I'll fix it.

@2625

Passing double arg regs via `r0:r1` or `r2:r3`, each RefPosition's RegMask should have only one exact register so the candidate register can be fixed. e.g.) ``` <RefPosition dotnet#4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0-r1] last> <RefPosition dotnet#4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r0-r1] last> ``` to be: ``` <RefPosition dotnet#4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0] last fixed> <RefPosition dotnet#4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r1] last fixed> ``` Fix #12994

wateret · 2017-08-07T04:36:16Z

@CarolEidt With a fix for your comment and rebasing, it is ready to get merged. Thanks!

Fix regression caused by dotnet#13023 Fix #13281

dnfclas added the cla-already-signed label Jul 25, 2017

BruceForstall requested review from pgavlin and CarolEidt July 25, 2017 17:58

CarolEidt suggested changes Jul 25, 2017

View reviewed changes

wateret force-pushed the fix-12994 branch from b2ca4a6 to a4f9b3a Compare August 2, 2017 11:32

CarolEidt reviewed Aug 3, 2017

View reviewed changes

wateret force-pushed the fix-12994 branch from 9a75412 to 84ccd89 Compare August 4, 2017 08:29

CarolEidt suggested changes Aug 4, 2017

View reviewed changes

wateret added 5 commits August 7, 2017 11:37

[RyuJIT/armel] Each reg gets own SpillFlag for MultiRegOp

0e0e60d

[RyuJIT/armel] Allocate one more temp for call address

45a2404

[RyuJIT/armel] arg regs are always TYP_INT

e36d674

[RyuJIT/armel] Fix ifdef condition and Formatting Error

99e4b01

wateret force-pushed the fix-12994 branch from 84ccd89 to 99e4b01 Compare August 7, 2017 02:37

CarolEidt merged commit 3e4965e into dotnet:master Aug 8, 2017

wateret added a commit to wateret/coreclr that referenced this pull request Aug 9, 2017

[RyuJIT/arm32] Fix Regression

6572d01

Fix regression caused by dotnet#13023 Fix #13281

wateret mentioned this pull request Aug 9, 2017

[RyuJIT/arm32] Fix Regression #13284

Merged

karelz modified the milestone: 2.1.0 Aug 28, 2017

This was referenced Jan 31, 2020

[RyuJIT/arm32] Regression : Assertion failed 'temp != nullptr' dotnet/runtime#8716

Closed

[RyuJIT/armel] HFA tests fail dotnet/runtime#8722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RyuJIT/armel] Make RefPosition arg regs fixed #13023

[RyuJIT/armel] Make RefPosition arg regs fixed #13023

wateret commented Jul 25, 2017

BruceForstall commented Jul 25, 2017

CarolEidt Jul 25, 2017

wateret Jul 26, 2017 •

edited

Loading

CarolEidt Jul 25, 2017

wateret commented Aug 1, 2017

wateret commented Aug 1, 2017

alpencolt commented Aug 1, 2017

CarolEidt left a comment

CarolEidt Aug 2, 2017

wateret Aug 3, 2017

CarolEidt Aug 3, 2017

wateret Aug 3, 2017

wateret Aug 3, 2017

CarolEidt Aug 3, 2017

wateret Aug 4, 2017 •

edited

Loading

wateret Aug 4, 2017

wateret commented Aug 4, 2017

CarolEidt left a comment

CarolEidt Aug 4, 2017

wateret Aug 7, 2017

wateret commented Aug 7, 2017

[RyuJIT/armel] Make RefPosition arg regs fixed #13023

[RyuJIT/armel] Make RefPosition arg regs fixed #13023

Conversation

wateret commented Jul 25, 2017

BruceForstall commented Jul 25, 2017

Choose a reason for hiding this comment

wateret Jul 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wateret commented Aug 1, 2017

wateret commented Aug 1, 2017

alpencolt commented Aug 1, 2017

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wateret Aug 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wateret commented Aug 4, 2017

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wateret commented Aug 7, 2017

wateret Jul 26, 2017 •

edited

Loading

wateret Aug 4, 2017 •

edited

Loading