-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the SIMD codegen for SIMD12 load/store #80083
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis adds support for containment on TYP_SIMD12 loads/stores and improves the codegen to require less temporary registers and use better instructions when available. Improved Load: - lea rax, bword ptr [rcx+30H]
- vmovss xmm0, dword ptr [rax+08H]
- vmovsd xmm1, qword ptr [rax]
- vshufps xmm1, xmm0, 68
+ vmovsd xmm0, qword ptr [rcx+30H]
+ vinsertps xmm0, dword ptr [rcx+38H], 2 Improved Store: - vmovsd qword ptr [rdx], xmm1
- vpshufd xmm0, xmm1, 2
- vmovss dword ptr [rdx+08H], xmm0
+ vmovsd qword ptr [rdx], xmm0
+ vextractps dword ptr [rdx+08H], xmm0, 2 Combined this saves 9 bytes of codegen and improves the PerScore by 1.5 Total diffs are all relatively similar. Emitting
|
CC. @dotnet/jit-contrib, this should be ready for review. Gives some small size savings for x64 (~2k bytes in fullopts and ~0.5k bytes in minopts) and a small TP win on x64 |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
Fixed the jitstress failure. Results in ~3.3k savings on x86/x64 and a -0.01% TP improvement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. There seems to be a failure for ARM though, but it looks like just one test case at the moment.
It's an unrelated/existing GC timeout. I've retriggered it and it should pass on rerun. |
5c4afad
to
ca3ada1
Compare
ca3ada1
to
7bee874
Compare
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
This adds support for containment on TYP_SIMD12 loads/stores and improves the codegen to require less temporary registers and use better instructions when available.
Improved Load:
Improved Store:
Combined this saves 9 bytes of codegen and improves the PerScore by 1.5
Total diffs are all relatively similar. Emitting
vmovsd + vinsertps
orvmovsd + vextractps
and removing now unnecessarylea
in favor of containing them.