Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Vectorize Quaternion #25510

Closed
wants to merge 1 commit into from
Closed

Vectorize Quaternion #25510

wants to merge 1 commit into from

Conversation

benaadams
Copy link
Member

return ans;
Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value);
q = Vector4.Normalize(q);
return Unsafe.As<Vector4, Quaternion>(ref q);
Copy link
Member Author

@benaadams benaadams Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if Unsafe.As<Vector4, Quaternion>(ref q) will prevent q being a register (e.g. address taken)

e.g. should it be?

public static Quaternion Normalize(Quaternion value)
{
    Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value);
    Vector4 result = Vector4.Normalize(q);
    return Unsafe.As<Vector4, Quaternion>(ref result);
}

@benaadams
Copy link
Member Author

Getting same failures as Linux release locally on Windows; though only in release mode

System.Numerics.Tests.QuaternionTests.QuaternionSubtractTest [FAIL]
        Assert.Equal() Failure
        Expected: {X:-4 Y:4 Z:4 W:-4}
        Actual:   {X:0 Y:0 Z:9.113475E+31 W:2.129974E-43}
        Stack Trace:
           C:\GitHub\corefx\src\System.Numerics.Vectors\tests\QuaternionTests.cs(660,0): at System.Numerics.Tests.QuaternionTests.QuaternionSubtractTest()

@benaadams
Copy link
Member Author

benaadams commented Nov 27, 2017

Some pretty weird results either using Unsafe.As or Unsafe.ReadUnaligned (in release only)

Quaternion.operator + did not return the expected value: 
  expected {X:6 Y:8 Z:10 W:12} actual {X:2 Y:0 Z:0 W:0}
Expected: True
Actual:   False

@benaadams benaadams force-pushed the quaternions branch 2 times, most recently from 292f1fe to f86df6c Compare November 27, 2017 04:18
@@ -383,7 +383,7 @@ public void QuaternionCreateFromYawPitchRollTest2()

Quaternion expected = yaw * pitch * roll;
Quaternion actual = Quaternion.CreateFromYawPitchRoll(yawRad, pitchRad, rollRad);
Assert.True(MathHelper.Equal(expected, actual), String.Format("Yaw:{0} Pitch:{1} Roll:{2}", yawAngle, pitchAngle, rollAngle));
Assert.True(MathHelper.Equal(expected, actual), $"Quaternion.QuaternionCreateFromYawPitchRollTest2 Yaw:{yawAngle} Pitch:{pitchAngle} Roll:{rollAngle} did not return the expected value: expected {expected} actual2 {actual}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this says actual2 {actual}? Is that just a type-o?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type-o

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ans.W = value.W;

return ans;
Vector4 q = -ToVector4(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick Benchmark on this change:

        [Benchmark]
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static void Conjugate()
        {
            Quaternion start = new Quaternion(8.5f, 9.4f, 1.2f, 1f);

            Quaternion c1 = Quaternion.Conjugate(start);
            Quaternion c2 = Quaternion.Conjugate(c1);
            Quaternion c3 = Quaternion.Conjugate(c2);
            Quaternion c4 = Quaternion.Conjugate(c3);
        }

And the results show a degradation of this method on my machine:

BenchmarkDotNet=v0.10.10.20171127-develop, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.19)
Processor=Intel Core i7-6700 CPU 3.40GHz (Skylake), ProcessorCount=8
Frequency=3328122 Hz, Resolution=300.4698 ns, Timer=TSC
.NET Core SDK=2.2.0-preview1-007522
  [Host]     : .NET Core 2.1.0-preview1-25907-02 (Framework 4.6.25901.06), 64bit RyuJIT

With your changes:

Method Mean Error StdDev
Conjugate 57.38 ns 0.9437 ns 0.8827 ns

Without the changes:

Method Mean Error StdDev
Conjugate 5.740 ns 0.0549 ns 0.0397 ns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see a degradation for Normalize. Same machine as above.

        [Benchmark]
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static void Normalize()
        {
            Quaternion start = new Quaternion(8.5f, 9.4f, 1.2f, 1f);

            Quaternion c1 = Quaternion.Normalize(start);
            Quaternion c2 = Quaternion.Normalize(c1);
            Quaternion c3 = Quaternion.Normalize(c2);
            Quaternion c4 = Quaternion.Normalize(c3);
        }

With your Normalize change:

Method Mean Error StdDev
Normalize 109.890 ns 1.6731 ns 1.5650 ns

Without your Normalize change:

Method Mean Error StdDev
Normalize 60.797 ns 0.3803 ns 0.3557 ns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was better with the Unsafe casting; but produced the wrong results in release 😢

           Method |      Mean |     Error |    StdDev | Scaled | ScaledSD |
----------------- |----------:|----------:|----------:|-------:|---------:|
  ConjugateUnsafe |  8.564 ns | 0.0385 ns | 0.0322 ns |   0.31 |     0.00 |
 ConjugateCurrent | 27.613 ns | 0.1639 ns | 0.1533 ns |   1.00 |     0.00 |
  ConjugateChange | 64.814 ns | 0.2741 ns | 0.2564 ns |   2.35 |     0.02 |

Will have to dig into why its producing wrong results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there issues if you make this a union, rather than using Unsafe.As?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly sure (can't find code atm) that the Jit won't consider struct with overlapping fields for a register

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to get a repo and file in coreclr; then revert this back to the unsafe version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaadams
Copy link
Member Author

Not sure about the FromVector4 and ToVector4 conversions in last version; will check asm

@CarolEidt
Copy link

I'm fairly sure (can't find code atm) that the Jit won't consider struct with overlapping fields for a register

This is captured in the lvOverlappingFields flag on the LclVarDsc.

@CarolEidt
Copy link

In lvaCanPromoteStructType(), it checks for overlapping fields here: https://github.com/dotnet/coreclr/blob/4e625b8cecd63dd6f0acaf82e28731f28ab9901d/src/jit/lclvars.cpp#L1501 and then disqualifies it from register promotion.

@jkotas
Copy link
Member

jkotas commented Nov 28, 2017

struct with overlapping fields for a register

BTW: Introducing overlapping fields also affects how the struct is passed in interop on Unix x64.

@karelz
Copy link
Member

karelz commented Jan 3, 2018

This PR is sitting here for 1 month. Any plans to push it forward @benaadams?

@benaadams
Copy link
Member Author

Any plans to push it forward

Stripped out the change not to use Unsafe as that made things slower; however then it hits the issue https://github.com/dotnet/coreclr/issues/15237

@karelz
Copy link
Member

karelz commented Jan 18, 2018

@benaadams the dependency seems to be resolved now.

@benaadams
Copy link
Member Author

As this was an issue in the Jit and Quaternion is also OOB; do I #if the changes for netcoreapp2.1?

@eerhardt
Copy link
Member

do I #if the changes for netcoreapp2.1?

I'd assume you'd have to, or else the tests won't pass on desktop, right?

Just an FYI - we try to not use #if, but instead split the differences into separate files. ex. Quaternion.netcoreapp.cs.

@benaadams benaadams force-pushed the quaternions branch 2 times, most recently from d414e86 to 6c3a6d7 Compare January 23, 2018 12:08
@benaadams
Copy link
Member Author

Updated

@benaadams
Copy link
Member Author

Have it wrong somehow?
Verifying closure of Microsoft.Private.CoreFx.NETCoreApp reference assemblies

04:24:40   mscorlib -> D:\j\workspace\windows-TGrou---74aa877a\bin\AnyOS.AnyCPU.Debug\mscorlib\netcoreapp\mscorlib.dll
04:24:44   System -> D:\j\workspace\windows-TGrou---74aa877a\bin\AnyOS.AnyCPU.Debug\System\netcoreapp\System.dll
04:24:46   System.Data -> D:\j\workspace\windows-TGrou---74aa877a\bin\AnyOS.AnyCPU.Debug\System.Data\netcoreapp\System.Data.dll
04:24:56   Microsoft.NETCore.Platforms -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/Microsoft.NETCore.Platforms.nuspec
04:24:56   Microsoft.NETCore.Targets -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/Microsoft.NETCore.Targets.nuspec
04:24:58   Microsoft.Private.CoreFx.NETCoreApp -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/Microsoft.Private.CoreFx.NETCoreApp.nuspec
04:24:58   Verifying closure of Microsoft.Private.CoreFx.NETCoreApp reference assemblies
04:24:58   Verifying no duplicate types in Microsoft.Private.CoreFx.NETCoreApp reference assemblies
04:25:16   Microsoft.Private.CoreFx.NETCoreApp -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/runtime.win-x64.Microsoft.Private.CoreFx.NETCoreApp.nuspec
04:25:16   Verifying closure of runtime.win-x64.Microsoft.Private.CoreFx.NETCoreApp runtime assemblies
04:25:16 D:\j\workspace\windows-TGrou---74aa877a\pkg\frameworkPackage.targets(124,5): error : Assembly 'System.Numerics.Vectors' is missing dependency 'System.Runtime.CompilerServices.Unsafe' [D:\j\workspace\windows-TGrou---74aa877a\pkg\Microsoft.Private.CoreFx.NETCoreApp\Microsoft.Private.CoreFx.NETCoreApp.pkgproj]
04:25:16 
04:25:16 Build FAILED.
04:25:16 
04:25:16 D:\j\workspace\windows-TGrou---74aa877a\pkg\frameworkPackage.targets(124,5): error : Assembly 'System.Numerics.Vectors' is missing dependency 'System.Runtime.CompilerServices.Unsafe' [D:\j\workspace\windows-TGrou---74aa877a\pkg\Microsoft.Private.CoreFx.NETCoreApp\Microsoft.Private.CoreFx.NETCoreApp.pkgproj]

@jkotas
Copy link
Member

jkotas commented Jan 23, 2018

System.Numerics.Vectors is inbox. System.Runtime.CompilerServices.Unsafe is out of box. Inbox cannot depend on out of box.

<ItemGroup Condition="'$(TargetGroup)' == 'netcoreapp'">
<Compile Include="System\Numerics\Quaternion.netcoreapp.cs" />
<Compile Include="$(CommonPath)\CoreLib\Internal\Runtime\CompilerServices\Unsafe.cs">
<Link>Common\CoreLib\Internal\Runtime\CompilerServices\Unsafe.cs</Link>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to reference the internal Unsafe in CoreLib. Local copy is not going to work - it won't be recognized by the JIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

<ItemGroup Condition="'$(TargetGroup)' == 'netcoreapp'">
 <Compile Include="System\Numerics\Quaternion.netcoreapp.cs" />
 <ReferenceFromRuntime Include="System.Private.CoreLib" />
</ItemGroup>
<ItemGroup Condition="'$(IsPartialFacadeAssembly)' != 'true' AND '$(TargetGroup)' != 'netcoreapp'">
 <Compile Include="System\Numerics\Quaternion.cs" />
</ItemGroup>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like this.

@benaadams
Copy link
Member Author

NETFX System.Drawing.Common.Tests

\src\System.Drawing.Common\tests\System.Drawing.Common.Tests.csproj]
 warning MSB3073: The command "D:\j\workspace\windows-TGrou---2a8f9c29\bin/tests/System.Drawing.Common.Tests
  /netfx-Windows_NT-Release-x86//RunTests.cmd D:\j\workspace\windows-TGrou---2a8f9c29\bin/testhost/netfx-Windows_NT-Release-x86/"
  exited with code 1. [D:\j\workspace\windows-TGrou---2a8f9c29
  \src\System.Drawing.Common\tests\System.Drawing.Common.Tests.csproj]
 error : One or more tests failed while running tests from 'System.Drawing.Common.Tests' please check 
 D:\j\workspace\windows-TGrou---2a8f9c29\bin/tests/System.Drawing.Common.Tests/netfx-Windows_NT-Release-x86/testResults.xml for details! 
 [D:\j\workspace\windows-TGrou---2a8f9c29\src\System.Drawing.Common\tests\System.Drawing.Common.Tests.csproj]
: error : (No message specified) [D:\j\workspace\windows-TGrou---2a8f9c29\src\tests.builds]

test NETFX x86 Release Build

<Link>System\MathF.netstandard.cs</Link>
</Compile>
</ItemGroup>
<!-- Optimize Quaternion as Vector4 for netcoreapp -->
<!-- Jit issue for other runtimes https://github.com/dotnet/coreclr/issues/15237 -->
<ItemGroup Condition="'$(IsPartialFacadeAssembly)' != 'true' AND $(TargetGroup.StartsWith('netcoreapp2'))">
Copy link
Member

@jkotas jkotas Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreLib reference can be used for live netcoreapp only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always gives me this when using netcoreapp

C:\GitHub\corefx\buildvertical.targets(168,5): error : 
Could not find a configuration for ProjectReference
'C:\GitHub\corefx\\external\runtime\runtime.depproj' from configurations
 netcoreapp-Windows_NT;
 netcoreapp-Unix;
 netcoreapp2.0-Windows_NT;
 netcoreapp2.0-Unix;
 uap;
 uapaot; 
 mono 
when building 'System.Numerics.Vectors' for configuration 
 netcoreapp
[C:\GitHub\corefx\src\System.Numerics.Vectors\src\System.Numerics.Vectors.csproj]

@benaadams
Copy link
Member Author

Should be intrinisic?

[Intrinsic]
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Vector4 operator /(Vector4 value1, float value2)

@benaadams
Copy link
Member Author

Unless moving some of Vector to coreclr broke the Intriniscs; or its a bad implementation of it?

@benaadams
Copy link
Member Author

Doesn't look like operator /(Vector4 value1, float value2) is an intrinsic; changing to

public static Quaternion Normalize(Quaternion value)
{
    Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value);

    float length = q.Length();
    Vector4 v = q / new Vector4(length);
    return Unsafe.As<Vector4, Quaternion>(ref v);
}

evaporates the division inlines

Inlines into 06000003 Program:Normalize_New():struct
  [1 IL=0020 TR=000011 06000006] [profitable inline] QuaternionStruct:.ctor(float,float,float,float):this
  [2 IL=0025 TR=000018 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [3 IL=0002 TR=000092 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [4 IL=0015 TR=000103 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [5 IL=0036 TR=000134 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
  [6 IL=0030 TR=000024 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [7 IL=0002 TR=000196 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [8 IL=0015 TR=000207 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [9 IL=0036 TR=000238 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
  [10 IL=0035 TR=000035 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [11 IL=0002 TR=000300 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [12 IL=0015 TR=000311 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [13 IL=0036 TR=000342 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
  [14 IL=0040 TR=000046 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [15 IL=0002 TR=000404 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [16 IL=0015 TR=000415 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [17 IL=0036 TR=000446 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
Budget: initialTime=198, finalTime=1188, initialBudget=1980, currentBudget=3004
Budget: increased by 1024 because of force inlines
Budget: initialSize=1180, finalSize=1331
; Assembly listing for method Program:Normalize_New():struct
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T01] (  4,  4   )   byref  ->  rcx        
;* V01 loc0         [V01    ] (  0,  0   )  struct (16) zero-ref   
;  V02 tmp1         [V02,T06] (  2,  4   )  struct (16) [rsp+0xA8]   do-not-enreg[SB]
;  V03 tmp2         [V03,T07] (  2,  4   )  struct (16) [rsp+0x98]   do-not-enreg[SB]
;  V04 tmp3         [V04,T08] (  2,  4   )  struct (16) [rsp+0x88]   do-not-enreg[SB]
;  V05 tmp4         [V05    ] (  2,  4   )  struct (16) [rsp+0x78]   do-not-enreg[XSVB] addr-exposed ld-addr-op
;  V06 tmp5         [V06,T02] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V07 tmp6         [V07    ] (  0,  0   )   float  ->  zero-ref   
;  V08 tmp7         [V08,T09] (  2,  4   )  simd16  ->  mm1        
;* V09 tmp8         [V09    ] (  0,  0   )  simd16  ->  zero-ref   
;  V10 tmp9         [V10,T16] (  2,  2   )  simd16  ->  [rsp+0x60]   do-not-enreg[SB] ld-addr-op
;  V11 tmp10        [V11,T20] (  2,  2   )   float  ->  mm1        
;  V12 tmp11        [V12,T21] (  2,  2   )   float  ->  mm1        
;  V13 tmp12        [V13,T10] (  2,  4   )  struct (16) [rsp+0x50]   do-not-enreg[SVB] ld-addr-op
;  V14 tmp13        [V14,T03] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V15 tmp14        [V15    ] (  0,  0   )   float  ->  zero-ref   
;  V16 tmp15        [V16,T11] (  2,  4   )  simd16  ->  mm1        
;* V17 tmp16        [V17    ] (  0,  0   )  simd16  ->  zero-ref   
;  V18 tmp17        [V18,T17] (  2,  2   )  simd16  ->  [rsp+0x40]   do-not-enreg[SB] ld-addr-op
;  V19 tmp18        [V19,T22] (  2,  2   )   float  ->  mm1        
;  V20 tmp19        [V20,T23] (  2,  2   )   float  ->  mm1        
;  V21 tmp20        [V21,T12] (  2,  4   )  struct (16) [rsp+0x30]   do-not-enreg[SVB] ld-addr-op
;  V22 tmp21        [V22,T04] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V23 tmp22        [V23    ] (  0,  0   )   float  ->  zero-ref   
;  V24 tmp23        [V24,T13] (  2,  4   )  simd16  ->  mm1        
;* V25 tmp24        [V25    ] (  0,  0   )  simd16  ->  zero-ref   
;  V26 tmp25        [V26,T18] (  2,  2   )  simd16  ->  [rsp+0x20]   do-not-enreg[SB] ld-addr-op
;  V27 tmp26        [V27,T24] (  2,  2   )   float  ->  mm1        
;  V28 tmp27        [V28,T25] (  2,  2   )   float  ->  mm1        
;  V29 tmp28        [V29,T14] (  2,  4   )  struct (16) [rsp+0x10]   do-not-enreg[SVB] ld-addr-op
;  V30 tmp29        [V30,T05] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V31 tmp30        [V31    ] (  0,  0   )   float  ->  zero-ref   
;  V32 tmp31        [V32,T15] (  2,  4   )  simd16  ->  mm1        
;* V33 tmp32        [V33    ] (  0,  0   )  simd16  ->  zero-ref   
;  V34 tmp33        [V34,T19] (  2,  2   )  simd16  ->  [rsp+0x00]   do-not-enreg[SB] ld-addr-op
;  V35 tmp34        [V35,T26] (  2,  2   )   float  ->  mm1        
;  V36 tmp35        [V36,T27] (  2,  2   )   float  ->  mm1        
;  V37 tmp36        [V37,T28] (  2,  2   )   float  ->  mm0         V01.X(offs=0x00) P-INDEP
;  V38 tmp37        [V38,T29] (  2,  2   )   float  ->  mm1         V01.Y(offs=0x04) P-INDEP
;  V39 tmp38        [V39,T30] (  2,  2   )   float  ->  mm2         V01.Z(offs=0x08) P-INDEP
;  V40 tmp39        [V40,T31] (  2,  2   )   float  ->  mm3         V01.W(offs=0x0c) P-INDEP
;  V41 tmp40        [V41,T00] (  5, 10   )   byref  ->  rax         stack-byref
;# V42 OutArgs      [V42    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 184

G_M39223_IG01:
       4881ECB8000000       sub      rsp, 184
       C5F877               vzeroupper 

G_M39223_IG02:
       C4E17A10055D010000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100D58010000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A101553010000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101D4E010000   vmovss   xmm3, dword ptr [reloc @RWD12]
       488D442478           lea      rax, bword ptr [rsp+78H]
       C4E17A1100           vmovss   dword ptr [rax], xmm0
       C4E17A114804         vmovss   dword ptr [rax+4], xmm1
       C4E17A115008         vmovss   dword ptr [rax+8], xmm2
       C4E17A11580C         vmovss   dword ptr [rax+12], xmm3
       C4E17910442478       vmovupd  xmm0, xmmword ptr [rsp+78H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442460       vmovapd  xmmword ptr [rsp+60H], xmm0
       C4E17A6F442460       vmovdqu  xmm0, qword ptr [rsp+60H]
       C4E17A7F8424A8000000 vmovdqu  qword ptr [rsp+A8H], xmm0
       C4E17A6F8424A8000000 vmovdqu  xmm0, qword ptr [rsp+A8H]
       C4E17A7F442450       vmovdqu  qword ptr [rsp+50H], xmm0
       C4E17910442450       vmovupd  xmm0, xmmword ptr [rsp+50H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442440       vmovapd  xmmword ptr [rsp+40H], xmm0
       C4E17A6F442440       vmovdqu  xmm0, qword ptr [rsp+40H]
       C4E17A7F842498000000 vmovdqu  qword ptr [rsp+98H], xmm0
       C4E17A6F842498000000 vmovdqu  xmm0, qword ptr [rsp+98H]
       C4E17A7F442430       vmovdqu  qword ptr [rsp+30H], xmm0
       C4E17910442430       vmovupd  xmm0, xmmword ptr [rsp+30H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
       C4E17A6F442420       vmovdqu  xmm0, qword ptr [rsp+20H]
       C4E17A7F842488000000 vmovdqu  qword ptr [rsp+88H], xmm0
       C4E17A6F842488000000 vmovdqu  xmm0, qword ptr [rsp+88H]
       C4E17A7F442410       vmovdqu  qword ptr [rsp+10H], xmm0
       C4E17910442410       vmovupd  xmm0, xmmword ptr [rsp+10H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E179290424         vmovapd  xmmword ptr [rsp], xmm0
       C4E17A6F0424         vmovdqu  xmm0, qword ptr [rsp]
       C4E17A7F01           vmovdqu  qword ptr [rcx], xmm0
       488BC1               mov      rax, rcx

G_M39223_IG03:
       4881C4B8000000       add      rsp, 184
       C3                   ret      

; Total bytes of code 357, prolog size 10 for method Program:Normalize_New():struct

And goes a little faster

            Method |     Mean |     Error |    StdDev |
------------------ |---------:|----------:|----------:|
 Normalize_Current | 63.99 ns | 0.1399 ns | 0.1169 ns |
     Normalize_New | 67.61 ns | 0.3144 ns | 0.2941 ns |
 Normalize_Vector4 | 33.75 ns | 0.0877 ns | 0.0685 ns |

Which suggests Vector4 operator should be changed to use / new Vector4?

@benaadams
Copy link
Member Author

Some of the asm is a bit redundant though?

       C4E17A7F8424A8000000 vmovdqu  qword ptr [rsp+A8H], xmm0
       C4E17A6F8424A8000000 vmovdqu  xmm0, qword ptr [rsp+A8H]

@mikedn
Copy link

mikedn commented Feb 14, 2018

Should be intrinisic?

It doesn't seem to be necessary. The fundamental problem seems to be that the current implementation of Vector4.operator/(Vector4, float) is unfortunate:

float invDiv = 1.0f / value2;
return new Vector4(value1.X * invDiv, value1.Y * invDiv, value1.Z * invDiv, value1.W * invDiv);

This should be

return value1 / new Vector4(value2);

that would give you a broadcast/shuffle + divps.
Or, if you want to be preserve the current numeric result, keep the scalar division but do vector multiplication:

return value1 * (1.0f / value2);

that would give you divss + broadcast/shuffle + mulps. But this approach is kind of lame. It's slower on current hardware and it's also less precise. Changing x / y into x * (1 / y) is not a "correct" FP optimization. It's something that people do when they are willing to trade accuracy for performance. In this case you get neither.

Some of the asm is a bit redundant though?

So it seems. Could be an inlining artifact, sometimes it generates copies that aren't removed by subsequent phases. Or it's an unfortunate side effect of using ref.

@benaadams
Copy link
Member Author

Should be intrinisic?

It doesn't seem to be necessary.

Issue: https://github.com/dotnet/coreclr/issues/16385

Workaround: #27122

@ahsonkhan
Copy link

@benaadams, what is the status of this PR? Any updates?

@benaadams
Copy link
Member Author

I've been kidnapped for 2 weeks

@karelz
Copy link
Member

karelz commented Mar 18, 2018

@benaadams how is this week treating you? 😉

@karelz
Copy link
Member

karelz commented Mar 18, 2018

BTW: If the change is considered "risky" by area owners, we might need to wait for master branch being reopen for post-2.1 work. (2-3 weeks)

@benaadams
Copy link
Member Author

Back on it

@benaadams
Copy link
Member Author

Windows x86 Release Build failure https://github.com/dotnet/corefx/issues/28453

@benaadams
Copy link
Member Author

Still not good 😢

            Method |     Mean |     Error |    StdDev |   Median |
------------------ |---------:|----------:|----------:|---------:|
 Normalize_Current | 62.70 ns | 1.2146 ns | 1.1362 ns | 63.63 ns |
     Normalize_New | 67.46 ns | 0.1043 ns | 0.0815 ns | 67.46 ns |
 Normalize_Vector4 | 15.90 ns | 0.3118 ns | 0.2917 ns | 16.05 ns |
public static Quaternion Normalize_Current()
{
    Quaternion start = new Quaternion(8.5f, 9.4f, 1.2f, 1f);

    Quaternion c1 = Quaternion.Normalize(start);
    Quaternion c2 = Quaternion.Normalize(c1);
    Quaternion c3 = Quaternion.Normalize(c2);
    return Quaternion.Normalize(c3);
}

public static QuaternionStruct Normalize_New()
{
    QuaternionStruct start = new QuaternionStruct(8.5f, 9.4f, 1.2f, 1f);

    QuaternionStruct c1 = QuaternionStruct.Normalize(start);
    QuaternionStruct c2 = QuaternionStruct.Normalize(c1);
    QuaternionStruct c3 = QuaternionStruct.Normalize(c2);
    return QuaternionStruct.Normalize(c3);
}

public static Vector4 Normalize_Vector4()
{
    Vector4 start = new Vector4(8.5f, 9.4f, 1.2f, 1f);

    Vector4 c1 = Vector4.Normalize(start);
    Vector4 c2 = Vector4.Normalize(c1);
    Vector4 c3 = Vector4.Normalize(c2);
    return Vector4.Normalize(c3);
}
; Assembly listing for method Program:Normalize_Vector4():struct
; ...
; Lcl frame size = 0

G_M3011_IG01:
       C5F877               vzeroupper 

G_M3011_IG02:
       C4E17A1005CC000000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100DC7000000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A1015C2000000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101DBD000000   vmovss   xmm3, dword ptr [reloc @RWD12]
       C4E15857E4           vxorps   xmm4, xmm4
       C4E15A10E3           vmovss   xmm4, xmm4, xmm3
       C4E15973FC04         vpslldq  xmm4, 4
       C4E15A10E2           vmovss   xmm4, xmm4, xmm2
       C4E15973FC04         vpslldq  xmm4, 4
       C4E15A10E1           vmovss   xmm4, xmm4, xmm1
       C4E15973FC04         vpslldq  xmm4, 4
       C4E15A10E0           vmovss   xmm4, xmm4, xmm0
       C4E17828C4           vmovaps  xmm0, xmm4
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E1791101           vmovupd  xmmword ptr [rcx], xmm0
       488BC1               mov      rax, rcx

G_M3011_IG03:
       C3                   ret      

; Total bytes of code 200, prolog size 3 for method Program:Normalize_Vector4():struct
; Assembly listing for method Program:Normalize_New():struct
;
;  V00 RetBuf       [V00,T05] (  4,  4   )   byref  ->  rcx        
;* V01 loc0         [V01    ] (  0,  0   )  struct (16) zero-ref   
;  V02 tmp1         [V02,T06] (  2,  4   )  struct (16) [rsp+0xA8]   do-not-enreg[SB]
;  V03 tmp2         [V03,T07] (  2,  4   )  struct (16) [rsp+0x98]   do-not-enreg[SB]
;  V04 tmp3         [V04,T08] (  2,  4   )  struct (16) [rsp+0x88]   do-not-enreg[SB]
;  V05 tmp4         [V05    ] (  2,  4   )  struct (16) [rsp+0x78]   do-not-enreg[XSVB] addr-exposed ld-addr-op
;  V06 tmp5         [V06,T16] (  2,  2   )  simd16  ->  [rsp+0x60]   do-not-enreg[SB] ld-addr-op
;  V07 tmp6         [V07,T17] (  2,  2   )  simd16  ->  mm0        
;  V08 tmp7         [V08,T01] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V09 tmp8         [V09    ] (  0,  0   )   float  ->  zero-ref   
;  V10 tmp9         [V10,T24] (  2,  2   )   float  ->  mm1        
;  V11 tmp10        [V11,T25] (  2,  2   )   float  ->  mm1        
;* V12 tmp11        [V12    ] (  0,  0   )  simd16  ->  zero-ref   
;  V13 tmp12        [V13,T09] (  2,  4   )  simd16  ->  mm1        
;  V14 tmp13        [V14,T10] (  2,  4   )  struct (16) [rsp+0x50]   do-not-enreg[SVB] ld-addr-op
;  V15 tmp14        [V15,T18] (  2,  2   )  simd16  ->  [rsp+0x40]   do-not-enreg[SB] ld-addr-op
;  V16 tmp15        [V16,T19] (  2,  2   )  simd16  ->  mm0        
;  V17 tmp16        [V17,T02] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V18 tmp17        [V18    ] (  0,  0   )   float  ->  zero-ref   
;  V19 tmp18        [V19,T26] (  2,  2   )   float  ->  mm1        
;  V20 tmp19        [V20,T27] (  2,  2   )   float  ->  mm1        
;* V21 tmp20        [V21    ] (  0,  0   )  simd16  ->  zero-ref   
;  V22 tmp21        [V22,T11] (  2,  4   )  simd16  ->  mm1        
;  V23 tmp22        [V23,T12] (  2,  4   )  struct (16) [rsp+0x30]   do-not-enreg[SVB] ld-addr-op
;  V24 tmp23        [V24,T20] (  2,  2   )  simd16  ->  [rsp+0x20]   do-not-enreg[SB] ld-addr-op
;  V25 tmp24        [V25,T21] (  2,  2   )  simd16  ->  mm0        
;  V26 tmp25        [V26,T03] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V27 tmp26        [V27    ] (  0,  0   )   float  ->  zero-ref   
;  V28 tmp27        [V28,T28] (  2,  2   )   float  ->  mm1        
;  V29 tmp28        [V29,T29] (  2,  2   )   float  ->  mm1        
;* V30 tmp29        [V30    ] (  0,  0   )  simd16  ->  zero-ref   
;  V31 tmp30        [V31,T13] (  2,  4   )  simd16  ->  mm1        
;  V32 tmp31        [V32,T14] (  2,  4   )  struct (16) [rsp+0x10]   do-not-enreg[SVB] ld-addr-op
;  V33 tmp32        [V33,T22] (  2,  2   )  simd16  ->  [rsp+0x00]   do-not-enreg[SB] ld-addr-op
;  V34 tmp33        [V34,T23] (  2,  2   )  simd16  ->  mm0        
;  V35 tmp34        [V35,T04] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V36 tmp35        [V36    ] (  0,  0   )   float  ->  zero-ref   
;  V37 tmp36        [V37,T30] (  2,  2   )   float  ->  mm1        
;  V38 tmp37        [V38,T31] (  2,  2   )   float  ->  mm1        
;* V39 tmp38        [V39    ] (  0,  0   )  simd16  ->  zero-ref   
;  V40 tmp39        [V40,T15] (  2,  4   )  simd16  ->  mm1        
;  V41 tmp40        [V41,T32] (  2,  2   )   float  ->  mm0         V01.X(offs=0x00) P-INDEP
;  V42 tmp41        [V42,T33] (  2,  2   )   float  ->  mm1         V01.Y(offs=0x04) P-INDEP
;  V43 tmp42        [V43,T34] (  2,  2   )   float  ->  mm2         V01.Z(offs=0x08) P-INDEP
;  V44 tmp43        [V44,T35] (  2,  2   )   float  ->  mm3         V01.W(offs=0x0c) P-INDEP
;  V45 tmp44        [V45,T00] (  5, 10   )   byref  ->  rax         stack-byref
;# V46 OutArgs      [V46    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 184

G_M39231_IG01:
       4881ECB8000000       sub      rsp, 184
       C5F877               vzeroupper 

G_M39231_IG02:
       C4E17A10055D010000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100D58010000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A101553010000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101D4E010000   vmovss   xmm3, dword ptr [reloc @RWD12]
       488D442478           lea      rax, bword ptr [rsp+78H]
       C4E17A1100           vmovss   dword ptr [rax], xmm0
       C4E17A114804         vmovss   dword ptr [rax+4], xmm1
       C4E17A115008         vmovss   dword ptr [rax+8], xmm2
       C4E17A11580C         vmovss   dword ptr [rax+12], xmm3
       C4E17910442478       vmovupd  xmm0, xmmword ptr [rsp+78H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442460       vmovapd  xmmword ptr [rsp+60H], xmm0
       C4E17A6F442460       vmovdqu  xmm0, qword ptr [rsp+60H]
       C4E17A7F8424A8000000 vmovdqu  qword ptr [rsp+A8H], xmm0
       C4E17A6F8424A8000000 vmovdqu  xmm0, qword ptr [rsp+A8H]
       C4E17A7F442450       vmovdqu  qword ptr [rsp+50H], xmm0
       C4E17910442450       vmovupd  xmm0, xmmword ptr [rsp+50H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442440       vmovapd  xmmword ptr [rsp+40H], xmm0
       C4E17A6F442440       vmovdqu  xmm0, qword ptr [rsp+40H]
       C4E17A7F842498000000 vmovdqu  qword ptr [rsp+98H], xmm0
       C4E17A6F842498000000 vmovdqu  xmm0, qword ptr [rsp+98H]
       C4E17A7F442430       vmovdqu  qword ptr [rsp+30H], xmm0
       C4E17910442430       vmovupd  xmm0, xmmword ptr [rsp+30H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
       C4E17A6F442420       vmovdqu  xmm0, qword ptr [rsp+20H]
       C4E17A7F842488000000 vmovdqu  qword ptr [rsp+88H], xmm0
       C4E17A6F842488000000 vmovdqu  xmm0, qword ptr [rsp+88H]
       C4E17A7F442410       vmovdqu  qword ptr [rsp+10H], xmm0
       C4E17910442410       vmovupd  xmm0, xmmword ptr [rsp+10H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E179290424         vmovapd  xmmword ptr [rsp], xmm0
       C4E17A6F0424         vmovdqu  xmm0, qword ptr [rsp]
       C4E17A7F01           vmovdqu  qword ptr [rcx], xmm0
       488BC1               mov      rax, rcx

G_M39231_IG03:
       4881C4B8000000       add      rsp, 184
       C3                   ret      

; Total bytes of code 357, prolog size 10 for method Program:Normalize_New():struct
; Assembly listing for method Program:Normalize_Current():struct
;...
;  V00 RetBuf       [V00,T01] (  4,  4   )   byref  ->  rsi        
;* V01 loc0         [V01    ] (  0,  0   )  struct (16) zero-ref   
;  V02 tmp1         [V02    ] (  2,  4   )  struct (16) [rsp+0x50]   do-not-enreg[XSB] addr-exposed
;  V03 tmp2         [V03    ] (  2,  4   )  struct (16) [rsp+0x40]   do-not-enreg[XSB] addr-exposed
;  V04 tmp3         [V04    ] (  2,  4   )  struct (16) [rsp+0x30]   do-not-enreg[XSB] addr-exposed
;  V05 tmp4         [V05,T06] (  2,  2   )   float  ->  mm0         V01.X(offs=0x00) P-INDEP
;  V06 tmp5         [V06,T07] (  2,  2   )   float  ->  mm1         V01.Y(offs=0x04) P-INDEP
;  V07 tmp6         [V07,T08] (  2,  2   )   float  ->  mm2         V01.Z(offs=0x08) P-INDEP
;  V08 tmp7         [V08,T09] (  2,  2   )   float  ->  mm3         V01.W(offs=0x0c) P-INDEP
;  V09 tmp8         [V09    ] ( 12, 24   )  struct (16) [rsp+0x20]   do-not-enreg[XSB] addr-exposed
;  V10 tmp9         [V10,T00] (  5, 10   )   byref  ->  rdx         stack-byref
;  V11 tmp10        [V11,T03] (  2,  4   )    long  ->  rcx        
;  V12 tmp11        [V12,T04] (  2,  4   )    long  ->  rcx        
;  V13 tmp12        [V13,T05] (  2,  4   )    long  ->  rcx        
;  V14 tmp13        [V14,T02] (  2,  4   )   byref  ->  rcx        
;  V15 OutArgs      [V15    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 96

G_M20468_IG01:
       56                   push     rsi
       4883EC60             sub      rsp, 96
       C5F877               vzeroupper 
       488BF1               mov      rsi, rcx

G_M20468_IG02:
       C4E17A1005A4000000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100D9F000000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A10159A000000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101D95000000   vmovss   xmm3, dword ptr [reloc @RWD12]
       488D4C2450           lea      rcx, bword ptr [rsp+50H]
       488D542420           lea      rdx, bword ptr [rsp+20H]
       C4E17A1102           vmovss   dword ptr [rdx], xmm0
       C4E17A114A04         vmovss   dword ptr [rdx+4], xmm1
       C4E17A115208         vmovss   dword ptr [rdx+8], xmm2
       C4E17A115A0C         vmovss   dword ptr [rdx+12], xmm3
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E8BEFBFFFF           call     Quaternion:Normalize(struct):struct
       488D4C2440           lea      rcx, bword ptr [rsp+40H]
       C4E17A6F442450       vmovdqu  xmm0, qword ptr [rsp+50H]
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], xmm0
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E8A1FBFFFF           call     Quaternion:Normalize(struct):struct
       488D4C2430           lea      rcx, bword ptr [rsp+30H]
       C4E17A6F442440       vmovdqu  xmm0, qword ptr [rsp+40H]
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], xmm0
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E884FBFFFF           call     Quaternion:Normalize(struct):struct
       488BCE               mov      rcx, rsi
       C4E17A6F442430       vmovdqu  xmm0, qword ptr [rsp+30H]
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], xmm0
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E869FBFFFF           call     Quaternion:Normalize(struct):struct
       488BC6               mov      rax, rsi

G_M20468_IG03:
       4883C460             add      rsp, 96
       5E                   pop      rsi
       C3                   ret      

; Total bytes of code 184, prolog size 8 for method Program:Normalize_Current():struct

@benaadams
Copy link
Member Author

Going to give up for now 😞

@benaadams benaadams closed this Mar 25, 2018
@benaadams
Copy link
Member Author

benaadams commented Mar 25, 2018

Difference between Vector4 and the Quaternion cast to Vector4 is mainly these blocks I think:

       C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
       C4E17A6F442420       vmovdqu  xmm0, qword ptr [rsp+20H]
       C4E17A7F842488000000 vmovdqu  qword ptr [rsp+88H], xmm0
       C4E17A6F842488000000 vmovdqu  xmm0, qword ptr [rsp+88H]
       C4E17A7F442410       vmovdqu  qword ptr [rsp+10H], xmm0
       C4E17910442410       vmovupd  xmm0, xmmword ptr [rsp+10H]

@benaadams
Copy link
Member Author

@benaadams benaadams reopened this Mar 25, 2018
@eerhardt
Copy link
Member

@benaadams - did you intend to reopen this PR? I see you gave up for now and closed it. But then re-opened it the same day.

I just want to verify if this PR should be opened or closed.

@benaadams
Copy link
Member Author

Closed it; then opened issue in coreclr; and reopened in hope :)

I'd like to Quaternion to be vectorized, but I also don't want to make it worse along the way...

@eerhardt
Copy link
Member

Note that the coreclr issue was moved to Future, which means it is unlikely to be fixed in .NET Core 2.1.

@benaadams
Copy link
Member Author

Added PR for the test changes in this PR #28582

Perhaps something to revisit with CPU intrinsics rather than Vector4

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants