-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86: support Return SIMD8. #46899
x86: support Return SIMD8. #46899
Conversation
/azp list |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
as expected x86 does not use it currently and x64 uses only for simd16. The types were wrong for x64 but they were not used. |
PTAL @dotnet/jit-contrib , cc @jkoritzinsky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on my knowledge. Just one comment to clarify.
src/coreclr/jit/codegenxarch.cpp
Outdated
// reg0 = opReg[31:0] | ||
inst_RV_RV(ins_Copy(opReg, TYP_INT), reg0, opReg, TYP_INT); | ||
// reg1 = opRef[63:32] | ||
inst_RV_RV_IV(INS_pextrd, EA_4BYTE, reg1, opReg, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed to be able to use an SSE4.1 instruction here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, we should be able to use PEXTRW
twice along with a 16 bit shift to emulate it. (IIRC SSE2 is our minimum instruction set support for x86)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find 5.0 requirements for some reason, @echesakovMSFT or @tannergooding probably know better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case SSE2 is our minimum, here's an instruction sequence that I think would be SSE2-compatible with equivalent results (given the return value is in xmm0) to save you time if needed:
pextrw eax, xmm0, 3
shl eax, 16
pextrw edx, xmm0, 2
or edx, eax
movd eax, xmm0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @jkoritzinsky - you would need to check for compExactlyDependsOn(InstructionSet_SSE41)
before emitting pextrd
in a similar way as it is done here
runtime/src/coreclr/jit/hwintrinsicxarch.cpp
Line 1218 in 08c3065
if (!compExactlyDependsOn(InstructionSet_SSE41)) |
In addition, please make sure that we don't emit that instruction during crossgen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you want compOpportunisticallyDependsOn
. The SSE2 vs SSE4.1 paths are equivalent, it's just that the SSE4.1 path is "faster" and should be preferred when it is available.
For crossgen1 SSE4.1
should be false, for crossgen2, it should depend on the ISA configuration (which I believe currently defaults to SSE4.1) and should flag the method as "uses SSE4.1" in which case it will be rejitted if false.
IIRC, there are some special considerations when it's all internal and therefore a IsSupported
check doesn't exist, but @jkotas and @davidwrighton are the two most familiar with this (David wrote the original logic and Jan made some recent changes in the area).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you want
compOpportunisticallyDependsOn
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compOpportunisticallyDependsOn
is what you want and it will do exactly the right thing when using crossgen2, but I believe it is not safe for this kind of purpose when compiling S.P.C with crossgen1. As we haven't quite gotten rid of crossgen1, you'll need to find some way to not generate the SSE4.1 support when crossgening S.P.C with original crossgen, or at least, wait until #47019 is in and enabled for both X86 and X64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to merge this PR as is with unsafe behavior for x86 old machines because:
- this code is emitted only for PInvoke stubs that we don't crossgen, so no real issue here;
- it could only affect old machines;
- it will automatically be fixed with a full switch to crossgen2;
the alternative is to merge the change with SSE2 and open an issue to return SSE4.1 code after crossgen1 deprecation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news. Crossgen2 was put in place to replace compilation for S.P.C over the weekend. The only time that S.P.C should now be compiled with crossgen1 is in self-contained app build scenarios before we transition those over as well, and as you state there is no actual use of this feature in S.P.C.
In light of that, I believe this is ok to check in.
Thanks, @tannergooding, the PR was updated based on your review.
The jit does not currently have a clear contract when to call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Support trees like:
that we return in
EAX/EDX
.Also, fix a few non-functional bugs like wrong printing or wrong instruction attributes that are not read.
Unblocks #46238