Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772

Merged
merged 15 commits into from
Sep 29, 2020

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Sep 2, 2020

Emit BMI2's rorx instead of rol if argument is a constant value. Fixes #32290

uint Test(uint x)
{
    return BitOperations.RotateRight(x, 20);
}

Old codegen:

G_M62026_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M62026_IG02:
       mov      rax, rdx
       rol      rax, 44
						;; bbWeight=1    PerfScore 2.25
G_M62026_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 8

New codegen:

G_M62026_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M62026_IG02:
       rorx     rax, rdx, 20 ; 20 is 44 but reversed (rol -> ror)
						;; bbWeight=1    PerfScore 2.00
G_M62026_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 7

Slightly improves String.GetHashCode (noticeable on small strings) since Marvin uses RotateLeft under the hood.

[Benchmark]
public int StringGetHashCode() => "hello world".GetHashCode();
        |            Method |     Mean |     Error |    StdDev |
        |------------------ |---------:|----------:|----------:|
 master | StringGetHashCode | 6.638 ns | 0.0022 ns | 0.0020 ns |
     PR | StringGetHashCode | 6.422 ns | 0.0016 ns | 0.0015 ns | 3.3% faster

Not sure perfScore is correct for rorx.

/cc @dotnet/jit-contrib @tannergooding

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 2, 2020
@@ -594,6 +594,7 @@ INST3(blsr, "blsr", IUM_WR, BAD_CODE, BAD_CODE,
INST3(bextr, "bextr", IUM_WR, BAD_CODE, BAD_CODE, SSE38(0xF7), INS_Flags_IsDstDstSrcAVXInstruction) // Bit Field Extract

// BMI2
INST3(rorx, "rorx", IUM_WR, BAD_CODE, BAD_CODE, SSE3A(0xF0), INS_FLAGS_None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to double-check that neither the IsDstDstSrcAVXInstruction nor IsDstSrcSrcAVXInstruction flags apply. The instruction looks to be a dst, src, imm operand in this case and so it shouldn't be flagged as RMW and should go down the right encoding paths in emit.

Copy link
Member Author

@EgorBo EgorBo Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried both of these flags and observed SEH exceptions on start 😞 with INS_FLAGS_None works like a charm.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@tannergooding
Copy link
Member

Are there any pmi diffs in the framework assemblies from this?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 4, 2020

Are there any pmi diffs in the framework assemblies from this?

here is -f --pmi (Release libs):


Analyzing CodeSize diffs...
Found 11 files with textual diffs.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 444 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
         206 : System.Private.CoreLib.dasm (0.00% of base)
          52 : System.Formats.Cbor.dasm (0.11% of base)
          52 : System.Net.Primitives.dasm (0.08% of base)
          52 : System.Private.Xml.dasm (0.00% of base)
          45 : xunit.execution.dotnet.dasm (0.02% of base)
          17 : Microsoft.CSharp.dasm (0.00% of base)
           9 : System.Drawing.Common.dasm (0.00% of base)
           5 : System.Runtime.Numerics.dasm (0.01% of base)
           2 : Microsoft.Extensions.DependencyModel.dasm (0.00% of base)
           2 : System.Composition.TypedParts.dasm (0.00% of base)
           2 : System.Resources.Extensions.dasm (0.01% of base)
11 total files with Code Size differences (0 improved, 11 regressed), 256 unchanged.
Top method regressions (bytes):
          45 ( 2.69% of base) : xunit.execution.dotnet.dasm - Sha1Digest:ProcessBlock():this
          42 (13.68% of base) : System.Formats.Cbor.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Net.Primitives.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Private.Xml.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          40 (14.71% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          29 (12.78% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32OrdinalIgnoreCase(byref,int,int,int):int
          19 ( 8.76% of base) : System.Private.CoreLib.dasm - HashCode:ToHashCode():int:this
          12 ( 3.77% of base) : System.Private.CoreLib.dasm - HashCode:Add(int):this (2 methods)
          11 (45.83% of base) : System.Private.CoreLib.dasm - HashCode:MixState(int,int,int,int):int
          10 (27.03% of base) : System.Formats.Cbor.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Net.Primitives.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.CoreLib.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.Xml.dasm - Marvin:Block(byref,byref)
           9 (34.62% of base) : System.Drawing.Common.dasm - Font:GetHashCode():int:this
           7 ( 1.05% of base) : System.Private.CoreLib.dasm - Sha1ForNonSecretPurposes:Drain():this
           6 ( 3.75% of base) : System.Private.CoreLib.dasm - HashCode:Combine(__Canon,long):int
           6 ( 4.26% of base) : System.Private.CoreLib.dasm - HashCode:Combine(ubyte,long):int
           6 ( 4.26% of base) : System.Private.CoreLib.dasm - HashCode:Combine(short,long):int
           6 ( 4.51% of base) : System.Private.CoreLib.dasm - HashCode:Combine(int,long):int
           6 ( 3.03% of base) : System.Private.CoreLib.dasm - HashCode:Combine(double,long):int
Top method improvements (bytes):
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
Top method regressions (percentages):
          11 (45.83% of base) : System.Private.CoreLib.dasm - HashCode:MixState(int,int,int,int):int
           9 (34.62% of base) : System.Drawing.Common.dasm - Font:GetHashCode():int:this
          10 (27.03% of base) : System.Formats.Cbor.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Net.Primitives.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.CoreLib.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.Xml.dasm - Marvin:Block(byref,byref)
           6 (22.22% of base) : System.Private.CoreLib.dasm - BinaryPrimitives:ReverseEndianness(int):int (2 methods)
           3 (16.67% of base) : System.Private.CoreLib.dasm - HashCode:Round(int,int):int
           3 (16.67% of base) : System.Private.CoreLib.dasm - HashCode:QueueRound(int,int):int
          40 (14.71% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          42 (13.68% of base) : System.Formats.Cbor.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Net.Primitives.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Private.Xml.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          29 (12.78% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32OrdinalIgnoreCase(byref,int,int,int):int
           2 (12.50% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CombineHash(int,int):int (2 methods)
           1 (10.00% of base) : Microsoft.CSharp.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : Microsoft.Extensions.DependencyModel.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Composition.TypedParts.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Resources.Extensions.dasm - HashHelpers:Combine(int,int):int
          19 ( 8.76% of base) : System.Private.CoreLib.dasm - HashCode:ToHashCode():int:this
Top method improvements (percentages):
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
63 total methods with Code Size differences (9 improved, 54 regressed), 259384 unchanged.
Completed analysis in 21.89s
PS C:\prj>

rorx reg, CNS is 1 byte bigger than mov reg, CNS + rol reg for 32bit integers. (but smaller for 64bit ones)
e.g. Marvin sligtly regresses but shows better perf numbers.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the question about handling GT_ROR

@am11
Copy link
Member

am11 commented Sep 4, 2020

rorx reg, CNS is 1 byte bigger than mov reg, CNS + rol reg for 32bit integers. (but smaller for 64bit ones)

would it make sense to only emit rorx for typeWidth==64?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 5, 2020

rorx reg, CNS is 1 byte bigger than mov reg, CNS + rol reg for 32bit integers. (but smaller for 64bit ones)

would it make sense to only emit rorx for typeWidth==64?

HashCode.Combine/String.GetHashCode use 32bit integers for rorx and benchmarks show better performance at cost of <0.0% size regression of CoreLib 🙂
also, llvm-mca:

image

@am11
Copy link
Member

am11 commented Sep 5, 2020

Thanks for the llvm-mca ref! :)
From the list of regression, I tried Marvin.ComputeHash32:

      40 (14.71% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int

https://godbolt.org/z/5hnWK1 - gcc generates slightly optimized result than clang, but both use mov+shr. Perhaps the cases where it regresses is when code is shifting Int64 to fit into Int32 variable.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 5, 2020

@am11
Here is the main regression from the diff, it's Marvin's Block:
image

Both gcc and clang emit the same codegen for it with rorx: https://godbolt.org/z/qz4vxo
(can't explain the perf difference for this one honestly, other than alignment?)

@EgorBo
Copy link
Member Author

EgorBo commented Sep 5, 2020

if add a condition to emit rorx only if otherwise we need to emit an addtional mov (targetReg != operandReg) the diff becomes:

C:\prj>jit-diff diff --output C:\prj\jitdiffs -f --core_root C:\prj\runtime-1\artifacts\tests\coreclr\Windows_NT.x64.Release\Tests\Core_Root --base C:\prj\runtime-1\artifacts\bin\coreclr\Windows_NT.x64.Checked_base --diff C:\prj\runtime-1\artifacts\bin\coreclr\Windows_NT.x64.Checked --pmi
Beginning PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies
| Finished 267/267 Base 267/267 Diff [436.9 sec]
Completed PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies in 437.09s
Diffs (if any) can be viewed by comparing: C:\prj\jitdiffs\dasmset_20\base C:\prj\jitdiffs\dasmset_20\diff
Analyzing CodeSize diffs...
Found 7 files with textual diffs.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 2 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
          17 : Microsoft.CSharp.dasm (0.00% of base)
           2 : Microsoft.Extensions.DependencyModel.dasm (0.00% of base)
           2 : System.Composition.TypedParts.dasm (0.00% of base)
           2 : System.Resources.Extensions.dasm (0.01% of base)
           2 : System.Runtime.Numerics.dasm (0.00% of base)
Top file improvements (bytes):
         -12 : xunit.execution.dotnet.dasm (-0.01% of base)
         -11 : System.Private.CoreLib.dasm (-0.00% of base)
7 total files with Code Size differences (2 improved, 5 regressed), 260 unchanged.
Top method regressions (bytes):
           3 ( 3.00% of base) : Microsoft.CSharp.dasm - CSharpBinaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.78% of base) : Microsoft.CSharp.dasm - CSharpConvertBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.16% of base) : Microsoft.CSharp.dasm - CSharpSetMemberBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 1.77% of base) : Microsoft.CSharp.dasm - CSharpGetMemberBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.20% of base) : Microsoft.CSharp.dasm - CSharpSetIndexBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.33% of base) : Microsoft.CSharp.dasm - CSharpUnaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           2 (12.50% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CombineHash(int,int):int (2 methods)
           1 (10.00% of base) : Microsoft.CSharp.dasm - HashHelpers:Combine(int,int):int
           1 ( 1.20% of base) : Microsoft.CSharp.dasm - BinderHelper:AddArgHashes(int,ref,ref):int
           1 ( 1.25% of base) : Microsoft.CSharp.dasm - CSharpIsEventBinder:GetGetBinderEquivalenceHash():int:this
           1 (10.00% of base) : Microsoft.Extensions.DependencyModel.dasm - HashHelpers:Combine(int,int):int
           1 ( 1.16% of base) : Microsoft.Extensions.DependencyModel.dasm - Dependency:GetHashCode():int:this
           1 (10.00% of base) : System.Composition.TypedParts.dasm - HashHelpers:Combine(int,int):int
           1 ( 1.54% of base) : System.Composition.TypedParts.dasm - ParameterInfoComparer:GetHashCode(ParameterInfo):int:this
           1 (10.00% of base) : System.Resources.Extensions.dasm - HashHelpers:Combine(int,int):int
           1 ( 0.41% of base) : System.Resources.Extensions.dasm - TypeNameComparer:GetHashCode(String):int:this
Top method improvements (bytes):
         -12 (-0.72% of base) : xunit.execution.dotnet.dasm - Sha1Digest:ProcessBlock():this
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.37% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
Top method regressions (percentages):
           2 (12.50% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CombineHash(int,int):int (2 methods)
           1 (10.00% of base) : Microsoft.CSharp.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : Microsoft.Extensions.DependencyModel.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Composition.TypedParts.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Resources.Extensions.dasm - HashHelpers:Combine(int,int):int
           3 ( 3.00% of base) : Microsoft.CSharp.dasm - CSharpBinaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.78% of base) : Microsoft.CSharp.dasm - CSharpConvertBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.33% of base) : Microsoft.CSharp.dasm - CSharpUnaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.20% of base) : Microsoft.CSharp.dasm - CSharpSetIndexBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.16% of base) : Microsoft.CSharp.dasm - CSharpSetMemberBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 1.77% of base) : Microsoft.CSharp.dasm - CSharpGetMemberBinder:GetGetBinderEquivalenceHash():int:this
           1 ( 1.54% of base) : System.Composition.TypedParts.dasm - ParameterInfoComparer:GetHashCode(ParameterInfo):int:this
           1 ( 1.25% of base) : Microsoft.CSharp.dasm - CSharpIsEventBinder:GetGetBinderEquivalenceHash():int:this
           1 ( 1.20% of base) : Microsoft.CSharp.dasm - BinderHelper:AddArgHashes(int,ref,ref):int
           1 ( 1.16% of base) : Microsoft.Extensions.DependencyModel.dasm - Dependency:GetHashCode():int:this
           1 ( 0.41% of base) : System.Resources.Extensions.dasm - TypeNameComparer:GetHashCode(String):int:this
Top method improvements (percentages):
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
         -12 (-0.72% of base) : xunit.execution.dotnet.dasm - Sha1Digest:ProcessBlock():this
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
          -1 (-0.37% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
27 total methods with Code Size differences (11 improved, 16 regressed), 259420 unchanged.
Completed analysis in 20.62s
PS C:\prj>

not sure we need to bother because I still can see perf win on String.GetHashCode:

    [Benchmark]
    public int HelloWorldString1() => "Привет Мир!".GetHashCode();

    [Benchmark]
    public int HelloWorldString2() => "Hello World!".GetHashCode();

    [Benchmark]
    public int LongString() => "Lorem ipsum .... %long string% ....".GetHashCode();
       |            Method |       Mean |     Error |    StdDev |
       |------------------ |-----------:|----------:|----------:|
master | HelloWorldString1 |   6.654 ns | 0.0029 ns | 0.0025 ns |
    PR | HelloWorldString1 |   6.408 ns | 0.0007 ns | 0.0006 ns | 3.7% faster

master | HelloWorldString2 |   7.415 ns | 0.0069 ns | 0.0058 ns |
    PR | HelloWorldString2 |   7.174 ns | 0.0044 ns | 0.0041 ns | 3.3% faster

master |        LongString | 263.603 ns | 0.0363 ns | 0.0303 ns |
    PR |        LongString | 262.502 ns | 0.0336 ns | 0.0298 ns | 0.5% faster

@tannergooding
Copy link
Member

if add a condition to emit rorx only if otherwise we need to emit an addtional mov (targetReg != operandReg)
...
not sure we need to bother because I still can see perf win on String.GetHashCode:

I think we want to use rol/ror in this case. rol/ror is 3 bytes as its base size while rorx is twice as big at 6 bytes.
https://uops.info/table.html and others (Agner's, the optimization manuals, etc) document the latency throughput for RORX/ROR as identical on Skylake and practically identical on Zen (ror is 0.25 while rorx is 0.37).
So any noise is likely due to timing inaccuracies or differences in loop/jump target alignment between runs.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 9, 2020

Changed the optimization to be a pure size improvement: only for 64bit integers and if targetReg != operandReg, e.g.

public static class Tests
{
    static uint Test1(uint x) => BitOperations.RotateRight(x, 20);
    static uint Test2(uint x) => BitOperations.RotateRight(x * 100, 20); // x*100 copies x to rax
    static ulong Test3(ulong x) => BitOperations.RotateRight(x, 20);
    static ulong Test4(ulong x) => BitOperations.RotateRight(x * 100, 20); // x*100 copies x to rax
}

it's emitted only for Test3 (eliminates a mov -2 bytes)

@CarolEidt
Copy link
Contributor

@tannergooding - there have been some changes since you last reviewed, though all, I believe, are in line with your comments. Any objection to merging this?

@tannergooding
Copy link
Member

@CarolEidt, only thing I noticed is that there isn't an LSRA change that marks the case where we'd use rorx as no longer being RMW, so we might get worse register selection in some cases, right?

Otherwise I think it looks fine and I don't have an issue merging it.

@CarolEidt
Copy link
Contributor

there isn't an LSRA change that marks the case where we'd use rorx as no longer being RMW, so we might get worse register selection in some cases, right?

Yes, and that will be the case for any post-register-allocation peephole optimizations we add.

@CarolEidt CarolEidt merged commit 1d9e50c into dotnet:master Sep 29, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BitOperations.RotateRight should emit rorx for constant offset
5 participants