[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772

EgorBo · 2020-09-02T23:13:00Z

Emit BMI2's rorx instead of rol if argument is a constant value. Fixes #32290

uint Test(uint x)
{
    return BitOperations.RotateRight(x, 20);
}

Old codegen:

G_M62026_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M62026_IG02:
       mov      rax, rdx
       rol      rax, 44
						;; bbWeight=1    PerfScore 2.25
G_M62026_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 8

New codegen:

G_M62026_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M62026_IG02:
       rorx     rax, rdx, 20 ; 20 is 44 but reversed (rol -> ror)
						;; bbWeight=1    PerfScore 2.00
G_M62026_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 7

Slightly improves String.GetHashCode (noticeable on small strings) since Marvin uses RotateLeft under the hood.

[Benchmark]
public int StringGetHashCode() => "hello world".GetHashCode();

        |            Method |     Mean |     Error |    StdDev |
        |------------------ |---------:|----------:|----------:|
 master | StringGetHashCode | 6.638 ns | 0.0022 ns | 0.0020 ns |
     PR | StringGetHashCode | 6.422 ns | 0.0016 ns | 0.0015 ns | 3.3% faster

Not sure perfScore is correct for rorx.

/cc @dotnet/jit-contrib @tannergooding

src/coreclr/src/jit/codegenxarch.cpp

tannergooding · 2020-09-03T16:12:12Z

src/coreclr/src/jit/instrsxarch.h

@@ -594,6 +594,7 @@ INST3(blsr,             "blsr",             IUM_WR, BAD_CODE,     BAD_CODE,
 INST3(bextr,            "bextr",            IUM_WR, BAD_CODE,     BAD_CODE,     SSE38(0xF7),                             INS_Flags_IsDstDstSrcAVXInstruction)    // Bit Field Extract

 // BMI2
+INST3(rorx,             "rorx",             IUM_WR, BAD_CODE,     BAD_CODE,     SSE3A(0xF0),                             INS_FLAGS_None)


It would be good to double-check that neither the IsDstDstSrcAVXInstruction nor IsDstSrcSrcAVXInstruction flags apply. The instruction looks to be a dst, src, imm operand in this case and so it shouldn't be flagged as RMW and should go down the right encoding paths in emit.

tried both of these flags and observed SEH exceptions on start 😞 with INS_FLAGS_None works like a charm.

src/coreclr/src/jit/instrsxarch.h

…ruction

CarolEidt

LGTM thanks!

tannergooding · 2020-09-04T17:51:36Z

Are there any pmi diffs in the framework assemblies from this?

EgorBo · 2020-09-04T17:59:13Z

Are there any pmi diffs in the framework assemblies from this?

here is -f --pmi (Release libs):


Analyzing CodeSize diffs...
Found 11 files with textual diffs.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 444 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
         206 : System.Private.CoreLib.dasm (0.00% of base)
          52 : System.Formats.Cbor.dasm (0.11% of base)
          52 : System.Net.Primitives.dasm (0.08% of base)
          52 : System.Private.Xml.dasm (0.00% of base)
          45 : xunit.execution.dotnet.dasm (0.02% of base)
          17 : Microsoft.CSharp.dasm (0.00% of base)
           9 : System.Drawing.Common.dasm (0.00% of base)
           5 : System.Runtime.Numerics.dasm (0.01% of base)
           2 : Microsoft.Extensions.DependencyModel.dasm (0.00% of base)
           2 : System.Composition.TypedParts.dasm (0.00% of base)
           2 : System.Resources.Extensions.dasm (0.01% of base)
11 total files with Code Size differences (0 improved, 11 regressed), 256 unchanged.
Top method regressions (bytes):
          45 ( 2.69% of base) : xunit.execution.dotnet.dasm - Sha1Digest:ProcessBlock():this
          42 (13.68% of base) : System.Formats.Cbor.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Net.Primitives.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Private.Xml.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          40 (14.71% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          29 (12.78% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32OrdinalIgnoreCase(byref,int,int,int):int
          19 ( 8.76% of base) : System.Private.CoreLib.dasm - HashCode:ToHashCode():int:this
          12 ( 3.77% of base) : System.Private.CoreLib.dasm - HashCode:Add(int):this (2 methods)
          11 (45.83% of base) : System.Private.CoreLib.dasm - HashCode:MixState(int,int,int,int):int
          10 (27.03% of base) : System.Formats.Cbor.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Net.Primitives.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.CoreLib.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.Xml.dasm - Marvin:Block(byref,byref)
           9 (34.62% of base) : System.Drawing.Common.dasm - Font:GetHashCode():int:this
           7 ( 1.05% of base) : System.Private.CoreLib.dasm - Sha1ForNonSecretPurposes:Drain():this
           6 ( 3.75% of base) : System.Private.CoreLib.dasm - HashCode:Combine(__Canon,long):int
           6 ( 4.26% of base) : System.Private.CoreLib.dasm - HashCode:Combine(ubyte,long):int
           6 ( 4.26% of base) : System.Private.CoreLib.dasm - HashCode:Combine(short,long):int
           6 ( 4.51% of base) : System.Private.CoreLib.dasm - HashCode:Combine(int,long):int
           6 ( 3.03% of base) : System.Private.CoreLib.dasm - HashCode:Combine(double,long):int
Top method improvements (bytes):
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
Top method regressions (percentages):
          11 (45.83% of base) : System.Private.CoreLib.dasm - HashCode:MixState(int,int,int,int):int
           9 (34.62% of base) : System.Drawing.Common.dasm - Font:GetHashCode():int:this
          10 (27.03% of base) : System.Formats.Cbor.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Net.Primitives.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.CoreLib.dasm - Marvin:Block(byref,byref)
          10 (27.03% of base) : System.Private.Xml.dasm - Marvin:Block(byref,byref)
           6 (22.22% of base) : System.Private.CoreLib.dasm - BinaryPrimitives:ReverseEndianness(int):int (2 methods)
           3 (16.67% of base) : System.Private.CoreLib.dasm - HashCode:Round(int,int):int
           3 (16.67% of base) : System.Private.CoreLib.dasm - HashCode:QueueRound(int,int):int
          40 (14.71% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          42 (13.68% of base) : System.Formats.Cbor.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Net.Primitives.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          42 (13.68% of base) : System.Private.Xml.dasm - Marvin:ComputeHash(ReadOnlySpan`1,long):long
          29 (12.78% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32OrdinalIgnoreCase(byref,int,int,int):int
           2 (12.50% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CombineHash(int,int):int (2 methods)
           1 (10.00% of base) : Microsoft.CSharp.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : Microsoft.Extensions.DependencyModel.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Composition.TypedParts.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Resources.Extensions.dasm - HashHelpers:Combine(int,int):int
          19 ( 8.76% of base) : System.Private.CoreLib.dasm - HashCode:ToHashCode():int:this
Top method improvements (percentages):
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
63 total methods with Code Size differences (9 improved, 54 regressed), 259384 unchanged.
Completed analysis in 21.89s
PS C:\prj>

rorx reg, CNS is 1 byte bigger than mov reg, CNS + rol reg for 32bit integers. (but smaller for 64bit ones)
e.g. Marvin sligtly regresses but shows better perf numbers.

src/coreclr/src/jit/codegenxarch.cpp

tannergooding

LGTM, modulo the question about handling GT_ROR

…ruction

am11 · 2020-09-04T22:56:07Z

rorx reg, CNS is 1 byte bigger than mov reg, CNS + rol reg for 32bit integers. (but smaller for 64bit ones)

would it make sense to only emit rorx for typeWidth==64?

EgorBo · 2020-09-05T08:04:02Z

rorx reg, CNS is 1 byte bigger than mov reg, CNS + rol reg for 32bit integers. (but smaller for 64bit ones)

would it make sense to only emit rorx for typeWidth==64?

HashCode.Combine/String.GetHashCode use 32bit integers for rorx and benchmarks show better performance at cost of <0.0% size regression of CoreLib 🙂
also, llvm-mca:

am11 · 2020-09-05T08:23:56Z

Thanks for the llvm-mca ref! :)
From the list of regression, I tried Marvin.ComputeHash32:

      40 (14.71% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int

https://godbolt.org/z/5hnWK1 - gcc generates slightly optimized result than clang, but both use mov+shr. Perhaps the cases where it regresses is when code is shifting Int64 to fit into Int32 variable.

EgorBo · 2020-09-05T12:48:07Z

@am11
Here is the main regression from the diff, it's Marvin's Block:

Both gcc and clang emit the same codegen for it with rorx: https://godbolt.org/z/qz4vxo
(can't explain the perf difference for this one honestly, other than alignment?)

EgorBo · 2020-09-05T13:07:59Z

if add a condition to emit rorx only if otherwise we need to emit an addtional mov (targetReg != operandReg) the diff becomes:

C:\prj>jit-diff diff --output C:\prj\jitdiffs -f --core_root C:\prj\runtime-1\artifacts\tests\coreclr\Windows_NT.x64.Release\Tests\Core_Root --base C:\prj\runtime-1\artifacts\bin\coreclr\Windows_NT.x64.Checked_base --diff C:\prj\runtime-1\artifacts\bin\coreclr\Windows_NT.x64.Checked --pmi
Beginning PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies
| Finished 267/267 Base 267/267 Diff [436.9 sec]
Completed PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies in 437.09s
Diffs (if any) can be viewed by comparing: C:\prj\jitdiffs\dasmset_20\base C:\prj\jitdiffs\dasmset_20\diff
Analyzing CodeSize diffs...
Found 7 files with textual diffs.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 2 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
          17 : Microsoft.CSharp.dasm (0.00% of base)
           2 : Microsoft.Extensions.DependencyModel.dasm (0.00% of base)
           2 : System.Composition.TypedParts.dasm (0.00% of base)
           2 : System.Resources.Extensions.dasm (0.01% of base)
           2 : System.Runtime.Numerics.dasm (0.00% of base)
Top file improvements (bytes):
         -12 : xunit.execution.dotnet.dasm (-0.01% of base)
         -11 : System.Private.CoreLib.dasm (-0.00% of base)
7 total files with Code Size differences (2 improved, 5 regressed), 260 unchanged.
Top method regressions (bytes):
           3 ( 3.00% of base) : Microsoft.CSharp.dasm - CSharpBinaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.78% of base) : Microsoft.CSharp.dasm - CSharpConvertBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.16% of base) : Microsoft.CSharp.dasm - CSharpSetMemberBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 1.77% of base) : Microsoft.CSharp.dasm - CSharpGetMemberBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.20% of base) : Microsoft.CSharp.dasm - CSharpSetIndexBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.33% of base) : Microsoft.CSharp.dasm - CSharpUnaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           2 (12.50% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CombineHash(int,int):int (2 methods)
           1 (10.00% of base) : Microsoft.CSharp.dasm - HashHelpers:Combine(int,int):int
           1 ( 1.20% of base) : Microsoft.CSharp.dasm - BinderHelper:AddArgHashes(int,ref,ref):int
           1 ( 1.25% of base) : Microsoft.CSharp.dasm - CSharpIsEventBinder:GetGetBinderEquivalenceHash():int:this
           1 (10.00% of base) : Microsoft.Extensions.DependencyModel.dasm - HashHelpers:Combine(int,int):int
           1 ( 1.16% of base) : Microsoft.Extensions.DependencyModel.dasm - Dependency:GetHashCode():int:this
           1 (10.00% of base) : System.Composition.TypedParts.dasm - HashHelpers:Combine(int,int):int
           1 ( 1.54% of base) : System.Composition.TypedParts.dasm - ParameterInfoComparer:GetHashCode(ParameterInfo):int:this
           1 (10.00% of base) : System.Resources.Extensions.dasm - HashHelpers:Combine(int,int):int
           1 ( 0.41% of base) : System.Resources.Extensions.dasm - TypeNameComparer:GetHashCode(String):int:this
Top method improvements (bytes):
         -12 (-0.72% of base) : xunit.execution.dotnet.dasm - Sha1Digest:ProcessBlock():this
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.37% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
Top method regressions (percentages):
           2 (12.50% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CombineHash(int,int):int (2 methods)
           1 (10.00% of base) : Microsoft.CSharp.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : Microsoft.Extensions.DependencyModel.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Composition.TypedParts.dasm - HashHelpers:Combine(int,int):int
           1 (10.00% of base) : System.Resources.Extensions.dasm - HashHelpers:Combine(int,int):int
           3 ( 3.00% of base) : Microsoft.CSharp.dasm - CSharpBinaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.78% of base) : Microsoft.CSharp.dasm - CSharpConvertBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.33% of base) : Microsoft.CSharp.dasm - CSharpUnaryOperationBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 2.20% of base) : Microsoft.CSharp.dasm - CSharpSetIndexBinder:GetGetBinderEquivalenceHash():int:this
           3 ( 2.16% of base) : Microsoft.CSharp.dasm - CSharpSetMemberBinder:GetGetBinderEquivalenceHash():int:this
           2 ( 1.77% of base) : Microsoft.CSharp.dasm - CSharpGetMemberBinder:GetGetBinderEquivalenceHash():int:this
           1 ( 1.54% of base) : System.Composition.TypedParts.dasm - ParameterInfoComparer:GetHashCode(ParameterInfo):int:this
           1 ( 1.25% of base) : Microsoft.CSharp.dasm - CSharpIsEventBinder:GetGetBinderEquivalenceHash():int:this
           1 ( 1.20% of base) : Microsoft.CSharp.dasm - BinderHelper:AddArgHashes(int,ref,ref):int
           1 ( 1.16% of base) : Microsoft.Extensions.DependencyModel.dasm - Dependency:GetHashCode():int:this
           1 ( 0.41% of base) : System.Resources.Extensions.dasm - TypeNameComparer:GetHashCode(String):int:this
Top method improvements (percentages):
          -2 (-1.83% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCode():int:this
          -1 (-1.56% of base) : Microsoft.CSharp.dasm - CSharpInvokeBinder:GetGetBinderEquivalenceHash():int:this
          -2 (-1.46% of base) : System.Private.CoreLib.dasm - String:GetNonRandomizedHashCodeOrdinalIgnoreCase():int:this
         -12 (-0.72% of base) : xunit.execution.dotnet.dasm - Sha1Digest:ProcessBlock():this
          -1 (-0.60% of base) : System.Private.CoreLib.dasm - CastHelpers:TryGet(long,long):int
          -1 (-0.52% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCast_Helper(long,Object):Object
          -1 (-0.50% of base) : System.Private.CoreLib.dasm - CastHelpers:StelemRef_Helper(byref,long,Object)
          -1 (-0.49% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstance_Helper(long,Object):Object
          -1 (-0.47% of base) : System.Private.CoreLib.dasm - CastHelpers:ChkCastAny(long,Object):Object
          -1 (-0.46% of base) : System.Private.CoreLib.dasm - CastHelpers:IsInstanceOfAny(long,Object):Object
          -1 (-0.37% of base) : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
27 total methods with Code Size differences (11 improved, 16 regressed), 259420 unchanged.
Completed analysis in 20.62s
PS C:\prj>

not sure we need to bother because I still can see perf win on String.GetHashCode:

    [Benchmark]
    public int HelloWorldString1() => "Привет Мир!".GetHashCode();

    [Benchmark]
    public int HelloWorldString2() => "Hello World!".GetHashCode();

    [Benchmark]
    public int LongString() => "Lorem ipsum .... %long string% ....".GetHashCode();

       |            Method |       Mean |     Error |    StdDev |
       |------------------ |-----------:|----------:|----------:|
master | HelloWorldString1 |   6.654 ns | 0.0029 ns | 0.0025 ns |
    PR | HelloWorldString1 |   6.408 ns | 0.0007 ns | 0.0006 ns | 3.7% faster

master | HelloWorldString2 |   7.415 ns | 0.0069 ns | 0.0058 ns |
    PR | HelloWorldString2 |   7.174 ns | 0.0044 ns | 0.0041 ns | 3.3% faster

master |        LongString | 263.603 ns | 0.0363 ns | 0.0303 ns |
    PR |        LongString | 262.502 ns | 0.0336 ns | 0.0298 ns | 0.5% faster

tannergooding · 2020-09-05T17:46:07Z

if add a condition to emit rorx only if otherwise we need to emit an addtional mov (targetReg != operandReg)
...
not sure we need to bother because I still can see perf win on String.GetHashCode:

I think we want to use rol/ror in this case. rol/ror is 3 bytes as its base size while rorx is twice as big at 6 bytes.
https://uops.info/table.html and others (Agner's, the optimization manuals, etc) document the latency throughput for RORX/ROR as identical on Skylake and practically identical on Zen (ror is 0.25 while rorx is 0.37).
So any noise is likely due to timing inaccuracies or differences in loop/jump target alignment between runs.

…ruction

EgorBo · 2020-09-09T15:31:01Z

Changed the optimization to be a pure size improvement: only for 64bit integers and if targetReg != operandReg, e.g.

public static class Tests
{
    static uint Test1(uint x) => BitOperations.RotateRight(x, 20);
    static uint Test2(uint x) => BitOperations.RotateRight(x * 100, 20); // x*100 copies x to rax
    static ulong Test3(ulong x) => BitOperations.RotateRight(x, 20);
    static ulong Test4(ulong x) => BitOperations.RotateRight(x * 100, 20); // x*100 copies x to rax
}

it's emitted only for Test3 (eliminates a mov -2 bytes)

CarolEidt · 2020-09-21T20:56:31Z

@tannergooding - there have been some changes since you last reviewed, though all, I believe, are in line with your comments. Any objection to merging this?

tannergooding · 2020-09-21T22:03:54Z

@CarolEidt, only thing I noticed is that there isn't an LSRA change that marks the case where we'd use rorx as no longer being RMW, so we might get worse register selection in some cases, right?

Otherwise I think it looks fine and I don't have an issue merging it.

CarolEidt · 2020-09-29T23:35:26Z

there isn't an LSRA change that marks the case where we'd use rorx as no longer being RMW, so we might get worse register selection in some cases, right?

Yes, and that will be the case for any post-register-allocation peephole optimizations we add.

Use rorx instead of rol when possible

562a10c

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 2, 2020

EgorBo added 2 commits September 3, 2020 11:42

check for BMI2 availability

928c55d

Update codegenxarch.cpp

ab495fc

tannergooding reviewed Sep 3, 2020

View reviewed changes

src/coreclr/src/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

Update codegenxarch.cpp

ddb3382

tannergooding reviewed Sep 3, 2020

View reviewed changes

src/coreclr/src/jit/instrsxarch.h Show resolved Hide resolved

EgorBo added 2 commits September 3, 2020 19:58

Merge branch 'master' of github.com:dotnet/runtime into add-rorx-inst…

462719c

…ruction

formatting

b85c87b

CarolEidt approved these changes Sep 4, 2020

View reviewed changes

tannergooding reviewed Sep 4, 2020

View reviewed changes

src/coreclr/src/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

tannergooding approved these changes Sep 4, 2020

View reviewed changes

EgorBo mentioned this pull request Sep 4, 2020

[RyuJIT] Emit shrx, sarx, shlx where needed #41881

Closed

EgorBo added 3 commits September 4, 2020 21:51

Merge branch 'master' of github.com:dotnet/runtime into add-rorx-inst…

d808b86

…ruction

handle GT_ROR

739007b

clean up

c5933e9

EgorBo added 3 commits September 5, 2020 21:10

Don't emit rorx when targetReg == operandReg

609af9e

Merge branch 'master' of github.com:dotnet/runtime into add-rorx-inst…

1b3e055

…ruction

emit rorx only for long integers

5a36432

wrap with TARGET_64BIT

da9d166

EgorBo added 2 commits September 9, 2020 18:34

clean up

7cbb1ce

clean up

9e9b1cb

CarolEidt merged commit 1d9e50c into dotnet:master Sep 29, 2020

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772

[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772

EgorBo commented Sep 2, 2020 •

edited

Loading

tannergooding Sep 3, 2020

EgorBo Sep 3, 2020 •

edited

Loading

CarolEidt left a comment

tannergooding commented Sep 4, 2020

EgorBo commented Sep 4, 2020 •

edited

Loading

tannergooding left a comment

am11 commented Sep 4, 2020

EgorBo commented Sep 5, 2020

am11 commented Sep 5, 2020

EgorBo commented Sep 5, 2020 •

edited

Loading

EgorBo commented Sep 5, 2020 •

edited

Loading

tannergooding commented Sep 5, 2020

EgorBo commented Sep 9, 2020 •

edited

Loading

CarolEidt commented Sep 21, 2020

tannergooding commented Sep 21, 2020

CarolEidt commented Sep 29, 2020

[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772

[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772

Conversation

EgorBo commented Sep 2, 2020 • edited Loading

Old codegen:

New codegen:

tannergooding Sep 3, 2020

Choose a reason for hiding this comment

EgorBo Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding commented Sep 4, 2020

EgorBo commented Sep 4, 2020 • edited Loading

tannergooding left a comment

Choose a reason for hiding this comment

am11 commented Sep 4, 2020

EgorBo commented Sep 5, 2020

am11 commented Sep 5, 2020

EgorBo commented Sep 5, 2020 • edited Loading

EgorBo commented Sep 5, 2020 • edited Loading

tannergooding commented Sep 5, 2020

EgorBo commented Sep 9, 2020 • edited Loading

CarolEidt commented Sep 21, 2020

tannergooding commented Sep 21, 2020

CarolEidt commented Sep 29, 2020

EgorBo commented Sep 2, 2020 •

edited

Loading

EgorBo Sep 3, 2020 •

edited

Loading

EgorBo commented Sep 4, 2020 •

edited

Loading

EgorBo commented Sep 5, 2020 •

edited

Loading

EgorBo commented Sep 5, 2020 •

edited

Loading

EgorBo commented Sep 9, 2020 •

edited

Loading