add support for genReg1/genReg2->SIMD8 store on x86 windows. #52581

sandreenko · 2021-05-11T01:57:25Z

sandreenko · 2021-05-11T02:14:50Z

src/coreclr/jit/codegenxarch.cpp

+    // and needs to be assembled into a single xmm register,
+    // note we can't check reg0=EAX, reg1=EDX because they could be already moved.
+
+    inst_RV_RV(ins_Copy(targetReg, TYP_FLOAT), targetReg, reg0, TYP_INT);


We don't use the last argument on x86 except for check that this is not a byte register. On other platforms, we need type only for size reasons (emitActualTypeSize(type)) so it does not really matter if we do inst_RV_RV(ins_Copy(targetReg, TYP_FLOAT), targetReg, reg0, TYP_INT) or inst_RV_RV(ins_Copy(targetReg, TYP_FLOAT), targetReg, reg0, TYP_FLOAT) here. However, looking at the other examples I think it expects the type of the source operand, not the dest.
Example:

runtime/src/coreclr/jit/simdcodegenxarch.cpp

Line 535 in 9f7abf5

inst_RV_RV(ins_Copy(op1loReg, TYP_FLOAT), targetReg, op1loReg, TYP_INT);

but it would be nice to get a confirmation.

Its meant to be used as inst_RV_RV(ins_Copy(srcReg, tgtType), tgtReg, srcReg, srcType)

For ins_Copy(srcReg, tgtType), its validated that srcReg and tgtType are "different" (meaning one is floating-point and one is integral). The tgtType is then used to determine if int->float or float->int and the correct instruction is selected (this is just movd on x86, but fmov or mov on arm64).

For srcType, this is merely used to pick the right emit attribute so that encoding/disassembly will be correct. In the case of int32<->float it doesn't matter since both are EA_4BYTE, but the typical usage so far has just been srcType.

@sandreenko - Could you please update the method definition of inst_RV_RV() and rename the parameter from type -> srcType in that case, for future maintainability?

@sandreenko - Could you please update the method definition of inst_RV_RV() and rename the parameter from type -> srcType in that case, for future maintainability?

Sure

Sorry, I think I didn't explain enough. To clarify its not necessarily srcType for all instructions, just for the ins_Copy(srcReg, tgtType) instructions.

There can be instructions where the emitAttr needs to be from the dstType or where its something else. It really depends on the particular instruction and what attribute it needs to correctly encode itself.

sandreenko · 2021-05-11T07:12:42Z

PTAL @echesakov , @dotnet/jit-contrib . The change is similar to #46899

src/coreclr/jit/codegenxarch.cpp

kunalspathak

Minor comment.

src/coreclr/jit/codegenxarch.cpp

sandreenko · 2021-05-11T16:33:53Z

/azp run runtime-coreclr crossgen2-composite

the pipeline is very red, but lets see if Vector2_3_4 gets fixed in ci.

azure-pipelines · 2021-05-11T16:35:44Z

Azure Pipelines successfully started running 1 pipeline(s).

sandreenko · 2021-05-11T19:47:43Z

I was able to repro it without crossgen2 using the same test with complus_JitNoStructPromotion=1, now I have checked that the new code gives us the correct results (with and without SSE4.1).

Ready for review.

Note that crossgen2 throws away the produces code and when we corerun the test we compile RunVector2Tests from scratch but it is not related to this issue, FYI @trylek, @davidwrighton .

trylek · 2021-05-11T20:03:02Z

Thanks Sergey for sharing the additional details. Have you been by any chance able to identify what is the reason the Crossgen2 code for the method is not being used at runtime? Do we skip it in Crossgen2 compilation? (Probably not, otherwise we wouldn't be hitting the SIMD JIT assertion.) Is that getting thrown out at runtime by some of the method fixup checks?

sandreenko · 2021-05-11T22:33:13Z

Thanks Sergey for sharing the additional details. Have you been by any chance able to identify what is the reason the Crossgen2 code for the method is not being used at runtime? Do we skip it in Crossgen2 compilation? (Probably not, otherwise we wouldn't be hitting the SIMD JIT assertion.) Is that getting thrown out at runtime by some of the method fixup checks?

I see the method in the "composite-r2r.dll" r2rdump, thanks for providing the command for it. So runtime throws it away, not sure why.

A quick debugging shows that it is rejected here:

runtime/src/coreclr/vm/prestub.cpp

Lines 393 to 394 in 31c5a7c

    
           #ifdef FEATURE_READYTORUN 
        
                   if (IsDynamicMethod() && GetLoaderModule()->IsSystem() && MayUsePrecompiledILStub())

this condition returns false so we don't try to load the R2R code.
It is either IsDynamicMethod() or GetLoaderModule()->IsSystem() but I don't have a debug VM to tell which of them rejects it.

sandreenko · 2021-05-12T01:43:57Z

/azp run runtime

azure-pipelines · 2021-05-12T01:44:24Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2021-05-12T22:15:09Z

src/coreclr/jit/instr.cpp

+// Arguments:
+//    ins   - the instruction to generate;
+//    reg1  - the first register to use, the dst for most instructions;
+//    tree  - the second register to use, the src for most instructions;


tree => reg2

sandreenko · 2021-05-13T17:02:05Z

The tests passed in https://dev.azure.com/dnceng/public/_build/results?buildId=1136599&view=results, "6 failing and 96 successful checks" is just another infra bug.

sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 11, 2021

sandreenko self-assigned this May 11, 2021

sandreenko commented May 11, 2021

View reviewed changes

sandreenko requested a review from echesakov May 11, 2021 14:47

tannergooding reviewed May 11, 2021

View reviewed changes