Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[RyuJIT/armel] Make RefPosition arg regs fixed #13023

Merged
merged 5 commits into from
Aug 8, 2017

Conversation

wateret
Copy link
Member

@wateret wateret commented Jul 25, 2017

Passing double arg regs via r0:r1 or r2:r3,
each RefPosition's RegMask should have only one exact register
so the candidate register can be fixed.

e.g.)

<RefPosition #4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0-r1] last>
<RefPosition #4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r0-r1] last>

to be:

<RefPosition #4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0] last fixed>
<RefPosition #4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r1] last fixed>

Fix #12994

@BruceForstall
Copy link
Member

cc @dotnet/arm32-contrib

@@ -1604,7 +1604,7 @@ void RegSet::rsSpillTree(regNumber reg, GenTreePtr tree, unsigned regIdx /* =0 *
#endif // _TARGET_ARM_
else
{
assert(!varTypeIsMultiReg(tree));
assert(!varTypeIsMultiReg(tree) || (m_rsCompiler->opts.compUseSoftFP && treeType == TYP_LONG));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. How does it know which have of the double/long is being spilled? I think it needs to be handled similarly to the other multiReg ops.

Copy link
Member Author

@wateret wateret Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. Would that mean GenTreePutArgReg(for armel) should have the field gtSpillFlags and corresponding setter/getters just like GenTreeCall and GenTreePutArgSplit do? (Since PutArgReg is MultiRegNode for double type args on armel)

useNode->gtLsraInfo.setSrcCandidates(this, candidates & ~candidate);
candidates = candidate;
}
#endif // ARM_SOFTFP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should more generally handle the case of useNode->IsMultiRegNode(). And it might be better to extract this into a separate method. I'm not crazy about modifying the srcCandidates on useNode, but there is probably not a better way that doesn't suffer from inefficiency.

@wateret
Copy link
Member Author

wateret commented Aug 1, 2017

@CarolEidt I updated the commit so each reg now gets own SpillFlag for GenTreeMultiRegOp(just like MultiRegCall and PutArgSplit). And it makes this test case pass on Release mode. However I get an assert failure on Debug.

Assert failure(PID 19407 [0x00004bcf], Thread: 19407 [0x4bcf]): Assertion failed 'temp != nullptr' in 'r8NaNrem:Main():int' (IL size 8610)

    File: /opt/code/work/src/jit/regset.cpp Line: 3313
    Image: /opt/usr/home/owner/dotnet/ttt/corerun

crashed [1471040486] processname=corerun, pid=19407, tid=19407, signal=6

It looks like RyuJIT does not allow adding temps while codegen, it pre-allocates temps before codegen which probably means that currently it does not correctly calculates how many temps we need. Could you give me some clue on this?

cc: @hseok-oh

@wateret
Copy link
Member Author

wateret commented Aug 1, 2017

FYI the register allocation looks unnatural. The below is pushing arguments for a call to a static method with two double arguments. However on RP#2889, r0 is spilled by the target address then it is spilled on RP#2895 by the argument again, then it gets proper register r8 on RP#2898. That is, the first two allocation was unnecessary. Maybe this is related to what I stated above.

Allocating Registers

...

 1860.#2881  I979   Def    Alloc r0   |I979 a|      |I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1860.#2882  I980   Def    Alloc r1   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2883  r0     Fixd   Keep  r0   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2884  I979   Use *  Keep  r0   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2885  r1     Fixd   Keep  r1   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1861.#2886  I980   Use *  Keep  r1   |I979 a|I980 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1862.#2887  I981   Def    Alloc r0   |I981 a|      |I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
--------------------------------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------
Loc   RP#    Name   Type  Action Reg  |r0    |r1    |r2    |r3    |r4    |r5    |r6    |r7    |r8    |r9    |r10   |r11   |r12   |sp    |lr
--------------------------------------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------
 1862.#2888  I982   Def    Alloc r1   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1864.#2889  C983   Def    Spill r0   |      |I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Steal r0   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2890  r2     Fixd   Keep  r2   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2891  I974   Use *  Keep  r2   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2892  r3     Fixd   Keep  r3   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2893  I975   Use *  Keep  r3   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2894  r0     Fixd   Keep  r0   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2895  I981   Use *  ReLod NA   |C983 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Spill r0   |      |I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Steal r0   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2896  r1     Fixd   Keep  r1   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2897  I982   Use *  Keep  r1   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
 1865.#2898  C983   Use *  ReLod NA   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|V5   a|V6   a|----  |----  |V7   a|----  |V21  a
                           Spill r8   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|      |V6   a|----  |----  |V7   a|----  |V21  a
                           Steal r8   |I981 a|I982 a|I974 a|I975 a|V0   a|V1   a|V3   a|V4   a|C983 a|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2899  r0     Kill   Keep  r0   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2900  r1     Kill   Keep  r1   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2901  r2     Kill   Keep  r2   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2902  r3     Kill   Keep  r3   |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |V7   a|----  |V21  a
 1866.#2903  r12    Kill   Spill r12  |      |      |      |      |V0   a|V1   a|V3   a|V4   a|C983 i|V6   a|----  |----  |      |----  |V21  a

...

@alpencolt
Copy link

@wateret :

It looks like RyuJIT does not allow adding temps while codegen, it pre-allocates temps before codegen which probably means that currently it does not correctly calculates how many temps we need. Could you give me some clue on this?

You're right. You have to declare amount of used register in lsra and then use them in codegen not more than you've requested. Another point you should remember is that destination register can be one from temporary set registers (I mean register allocator can provide for source register r2, for temporary r1 and for destination again r1). To avoid it set isInternalRegDelayFree=true.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable approach overall, though I filed issue #13183 to unify the handling of multi-reg nodes.
However, I don't understand the arbitrary increment of the spill temp count.

}

unspillTree->gtFlags &= ~GTF_SPILLED;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should merge these cases, and unify the handling of all the different multi-reg ops on the various targets. I'll open a separate issue for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt Definitely I agree with that.

src/jit/lsra.cpp Outdated
if (compiler->opts.compUseSoftFP)
{
JITDUMP("Adding a spill temp for moving target address to a register.\n");
maxSpill[TYP_INT] += 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Where is this spill temp being used, what is it used for, and how is it obtained?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt After I made the second commit(I thought it would be enough), but I got the assertion mentioned from #13023 (comment). So I had to fix it, and the code here is sort of a workaround.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I stated here #13023 (comment) was strange part while LSRA, the C# code of this part is roughly like:

func(a, b);
// static SomeReturnType func(double, double)

While setting r0-r3 arguments, they are not fixed until the call. So r0 is spilled (since no other free registers) for call address. Eventually the call address goes to r8 though. And this probably requires one more temp variable.
I think this could be an issue for general arm, not only armel, since x86/x64 usually does not store call address to a register, directly uses a full immediate address.

% The exact TC I was looking at is JIT/Methodical/NaN/r8NaNmul_cs_do/r8NaNmul_cs_do.exe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually the call address goes to r8 though. And this probably requires one more temp variable.

There should be a spill temp accounted for with the spill address of the target register. Given the dump segment that you shared, it would seem that:

  • The instruction at N1861 defines a value (I981) into r0. It is spilled, so that will result in maxSpill[TYP_INT] being set to at least 1 (it would be more if there are other outstanding spills, but that's not 100% clear from the dump fragment).
  • The instruction at N1863 defines the call target address (and the definition is located at 1864). It is also spilled, so at this point maxSpill[TYP_INT] should be at least 2.
  • It appears that the call is at N1865. It reloads I981 into r0, which causes maxSpill[TYP_INT] to be decremented.
  • It then reloads the target address (C983) into r8, which causes maxSpill[TYP_INT] to be decremented again - back to 0 or whatever it was at Location 1861.
  • Then, it spills r12 (V7), which requires an additional spill temp, so maxSpill[TYP_INT] should now be at least one again.

During codegen, these spills and reloads should be happening in this same order. Can you verify that's the case?

Copy link
Member Author

@wateret wateret Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt Looks like it does.

Generating: N1851 ( 47, 33) [000885] ---X---------      t885 = *  IND       double REG f8 <l:$350, c:$6db>
							Byref regs: 0008 {r3} => 0000 {}
IN037d:             vldr    d8, [r3]
                                                              /--*  t885   double 
Generating: N1853 (???,???) [014273] ---X---------    t14273 = *  COPY      long   REG r2,r3
                                                              /--*  t14273 long   
Generating: N1855 (???,???) [014274] ---X---------    t14274 = *  PUTARG_REG long   REG r2,r3
IN037e:             vmov.d2i r2, r3, d8
Generating: N1857 (  1,  1) [000872] -------------      t872 =    CNS_DBL   double 0.0000000000000000 REG f8 $80
IN037f:             movs    r0, 0
IN0380:             movs    r1, 0
IN0381:             vmov.i2d d8, r0, r1
                                                              /--*  t872   double 
Generating: N1859 (???,???) [014275] -------------    t14275 = *  COPY      long   REG r0,r1
                                                              /--*  t14275 long   
Generating: N1861 (???,???) [014276] ------------Z    t14276 = *  PUTARG_REG long   REG r0,r1
IN0382:             vmov.d2i r0, r1, d8
reused temp #2, slot 0, size = 4
							The register r0 spilled with    [014276]
IN0383:             str     r0, [sp+0x10]	// [TEMP_02]
Generating: N1863 (  3,  7) [014277] ------------Z    t14277 =    CNS_INT(h) int    0xB642FE33 ftn REG r0
IN0384:             movw    r0, 0xfe33
IN0385:             movt    r0, 0xb642
reused temp #1, slot 0, size = 4
							The register r0 spilled with    [014277]
IN0386:             str     r0, [sp+0x0c]	// [TEMP_01]
                                                              /--*  t14277 int    
Generating:                [019739] -------------    t19739 = *  RELOAD    int    REG r8
                                                              /--*  t14274 long   arg1 r2,r3
                                                              +--*  t14276 long   arg0 r0,r1
                                                              +--*  t19739 int    control expr
Generating: N1865 ( 64, 46) [000886] --CX---------      t886 = *  CALL help double HELPER.CORINFO_HELP_DBLREM <l:$322, c:$323>
							Tree-Node marked unspilled from  [014276]
IN0387:             ldr     r0, [sp+0x10]	// [TEMP_02]
release temp #2, slot 0, size = 4
							Tree-Node marked unspilled from  [014277]
IN0388:             ldr     r8, [sp+0x0c]	// [TEMP_01]
release temp #1, slot 0, size = 4

With allocating 1 more temp for this function it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this cannot be a solution since it will allocate one more temps for all methods.

@wateret
Copy link
Member Author

wateret commented Aug 4, 2017

@CarolEidt
I reverted the previous commit(a4f9b3a), and made change to LinearScan::updateMaxSpill().

Previously the temps are:

;  TEMP_01                                     int  ->  [r11-0x110]
;  TEMP_02                                    long  ->  [r11-0x118]

TEMP_02 is unused and we need one more int temp. When it is spilled, since the type of PUTARG_REG is TYP_LONG, a wrong type of temp was generated. I think we need to treat them as two separated INTs.
And the new commit now generates temps correctly.

;  TEMP_02                                     int  ->  [r11-0x110]
;  TEMP_01                                     int  ->  [r11-0x114]

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a suggested #ifdef change and some formatting fixes, I think this will be ready to merge; I think the arm failures may be due to #13186?

src/jit/lsra.cpp Outdated
else if (treeNode->OperIsPutArgReg())
{
// For double arg regs, the type is changed to long since they must be passed via `r0-r3`.
// However when they get spilled, they should be treated as separated int registers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is just the case for ARM_SOFTFP, right? In other cases, an actual TYP_LONG would have been decomposed, so it would be better to put this under #ifdef ARM_SOFTFP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt Right, I'll fix it.

wateret added 5 commits August 7, 2017 11:37
Passing double arg regs via `r0:r1` or `r2:r3`,
each RefPosition's RegMask should have only one exact register
so the candidate register can be fixed.

e.g.)

```
<RefPosition dotnet#4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0-r1] last>
<RefPosition dotnet#4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r0-r1] last>
```

to be:

```
<RefPosition dotnet#4099 @2625 RefTypeUse <Ivl:1237> BB35 regmask=[r0] last fixed>
<RefPosition dotnet#4100 @2625 RefTypeUse <Ivl:1238> BB35 regmask=[r1] last fixed>
```

Fix #12994
@wateret
Copy link
Member Author

wateret commented Aug 7, 2017

@CarolEidt With a fix for your comment and rebasing, it is ready to get merged. Thanks!

@CarolEidt CarolEidt merged commit 3e4965e into dotnet:master Aug 8, 2017
wateret added a commit to wateret/coreclr that referenced this pull request Aug 9, 2017
Fix regression caused by dotnet#13023

Fix #13281
@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RyuJIT/armel] Assert Failure farthestRefPhysRegRecord != nullptr
6 participants