Vectorize Quaternion #25510

benaadams · 2017-11-26T19:47:30Z

Contributes to https://github.com/dotnet/corefx/issues/7751

PTAL @mellinoe @eerhardt @tannergooding @CarolEidt

benaadams · 2017-11-26T19:49:43Z

src/System.Numerics.Vectors/src/System/Numerics/Quaternion.cs

-            return ans;
+            Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value);
+            q = Vector4.Normalize(q);
+            return Unsafe.As<Vector4, Quaternion>(ref q);


Not sure if Unsafe.As<Vector4, Quaternion>(ref q) will prevent q being a register (e.g. address taken)

e.g. should it be?

public static Quaternion Normalize(Quaternion value) { Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value); Vector4 result = Vector4.Normalize(q); return Unsafe.As<Vector4, Quaternion>(ref result); }

benaadams · 2017-11-27T00:43:24Z

Getting same failures as Linux release locally on Windows; though only in release mode

System.Numerics.Tests.QuaternionTests.QuaternionSubtractTest [FAIL]
        Assert.Equal() Failure
        Expected: {X:-4 Y:4 Z:4 W:-4}
        Actual:   {X:0 Y:0 Z:9.113475E+31 W:2.129974E-43}
        Stack Trace:
           C:\GitHub\corefx\src\System.Numerics.Vectors\tests\QuaternionTests.cs(660,0): at System.Numerics.Tests.QuaternionTests.QuaternionSubtractTest()

benaadams · 2017-11-27T04:06:48Z

Some pretty weird results either using Unsafe.As or Unsafe.ReadUnaligned (in release only)

Quaternion.operator + did not return the expected value: 
  expected {X:6 Y:8 Z:10 W:12} actual {X:2 Y:0 Z:0 W:0}
Expected: True
Actual:   False

eerhardt · 2017-11-27T18:32:55Z

src/System.Numerics.Vectors/tests/QuaternionTests.cs

@@ -383,7 +383,7 @@ public void QuaternionCreateFromYawPitchRollTest2()

                        Quaternion expected = yaw * pitch * roll;
                        Quaternion actual = Quaternion.CreateFromYawPitchRoll(yawRad, pitchRad, rollRad);
-                        Assert.True(MathHelper.Equal(expected, actual), String.Format("Yaw:{0} Pitch:{1} Roll:{2}", yawAngle, pitchAngle, rollAngle));
+                        Assert.True(MathHelper.Equal(expected, actual), $"Quaternion.QuaternionCreateFromYawPitchRollTest2 Yaw:{yawAngle} Pitch:{pitchAngle} Roll:{rollAngle} did not return the expected value: expected {expected} actual2 {actual}");


Any reason this says actual2 {actual}? Is that just a type-o?

eerhardt · 2017-11-27T19:26:08Z

src/System.Numerics.Vectors/src/System/Numerics/Quaternion.cs

-            ans.W = value.W;
-
-            return ans;
+            Vector4 q = -ToVector4(value);


I did a quick Benchmark on this change:

[Benchmark] [MethodImpl(MethodImplOptions.AggressiveInlining)] public static void Conjugate() { Quaternion start = new Quaternion(8.5f, 9.4f, 1.2f, 1f); Quaternion c1 = Quaternion.Conjugate(start); Quaternion c2 = Quaternion.Conjugate(c1); Quaternion c3 = Quaternion.Conjugate(c2); Quaternion c4 = Quaternion.Conjugate(c3); }

And the results show a degradation of this method on my machine:

BenchmarkDotNet=v0.10.10.20171127-develop, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.19) Processor=Intel Core i7-6700 CPU 3.40GHz (Skylake), ProcessorCount=8 Frequency=3328122 Hz, Resolution=300.4698 ns, Timer=TSC .NET Core SDK=2.2.0-preview1-007522 [Host] : .NET Core 2.1.0-preview1-25907-02 (Framework 4.6.25901.06), 64bit RyuJIT

With your changes:

Method Mean Error StdDev

Conjugate 57.38 ns 0.9437 ns 0.8827 ns

Without the changes:

Method Mean Error StdDev

Conjugate 5.740 ns 0.0549 ns 0.0397 ns

I also see a degradation for Normalize. Same machine as above.

[Benchmark] [MethodImpl(MethodImplOptions.AggressiveInlining)] public static void Normalize() { Quaternion start = new Quaternion(8.5f, 9.4f, 1.2f, 1f); Quaternion c1 = Quaternion.Normalize(start); Quaternion c2 = Quaternion.Normalize(c1); Quaternion c3 = Quaternion.Normalize(c2); Quaternion c4 = Quaternion.Normalize(c3); }

With your Normalize change:

Method Mean Error StdDev

Normalize 109.890 ns 1.6731 ns 1.5650 ns

Without your Normalize change:

Method Mean Error StdDev

Normalize 60.797 ns 0.3803 ns 0.3557 ns

It was better with the Unsafe casting; but produced the wrong results in release 😢

Method | Mean | Error | StdDev | Scaled | ScaledSD | ----------------- |----------:|----------:|----------:|-------:|---------:| ConjugateUnsafe | 8.564 ns | 0.0385 ns | 0.0322 ns | 0.31 | 0.00 | ConjugateCurrent | 27.613 ns | 0.1639 ns | 0.1533 ns | 1.00 | 0.00 | ConjugateChange | 64.814 ns | 0.2741 ns | 0.2564 ns | 2.35 | 0.02 |

Will have to dig into why its producing wrong results.

Are there issues if you make this a union, rather than using Unsafe.As?

I'm fairly sure (can't find code atm) that the Jit won't consider struct with overlapping fields for a register

Will try to get a repo and file in coreclr; then revert this back to the unsafe version

Raised issue https://github.com/dotnet/coreclr/issues/15237

benaadams · 2017-11-27T19:26:37Z

Not sure about the FromVector4 and ToVector4 conversions in last version; will check asm

CarolEidt · 2017-11-27T21:23:11Z

I'm fairly sure (can't find code atm) that the Jit won't consider struct with overlapping fields for a register

This is captured in the lvOverlappingFields flag on the LclVarDsc.

CarolEidt · 2017-11-27T21:27:18Z

In lvaCanPromoteStructType(), it checks for overlapping fields here: https://github.com/dotnet/coreclr/blob/4e625b8cecd63dd6f0acaf82e28731f28ab9901d/src/jit/lclvars.cpp#L1501 and then disqualifies it from register promotion.

jkotas · 2017-11-28T02:11:57Z

struct with overlapping fields for a register

BTW: Introducing overlapping fields also affects how the struct is passed in interop on Unix x64.

karelz · 2018-01-03T19:14:27Z

This PR is sitting here for 1 month. Any plans to push it forward @benaadams?

benaadams · 2018-01-03T19:40:57Z

Any plans to push it forward

Stripped out the change not to use Unsafe as that made things slower; however then it hits the issue https://github.com/dotnet/coreclr/issues/15237

karelz · 2018-01-18T01:59:11Z

@benaadams the dependency seems to be resolved now.

benaadams · 2018-01-18T11:19:20Z

As this was an issue in the Jit and Quaternion is also OOB; do I #if the changes for netcoreapp2.1?

eerhardt · 2018-01-18T15:41:57Z

do I #if the changes for netcoreapp2.1?

I'd assume you'd have to, or else the tests won't pass on desktop, right?

Just an FYI - we try to not use #if, but instead split the differences into separate files. ex. Quaternion.netcoreapp.cs.

benaadams · 2018-01-23T12:09:25Z

Updated

benaadams · 2018-01-23T16:54:39Z

Have it wrong somehow?
Verifying closure of Microsoft.Private.CoreFx.NETCoreApp reference assemblies

04:24:40   mscorlib -> D:\j\workspace\windows-TGrou---74aa877a\bin\AnyOS.AnyCPU.Debug\mscorlib\netcoreapp\mscorlib.dll
04:24:44   System -> D:\j\workspace\windows-TGrou---74aa877a\bin\AnyOS.AnyCPU.Debug\System\netcoreapp\System.dll
04:24:46   System.Data -> D:\j\workspace\windows-TGrou---74aa877a\bin\AnyOS.AnyCPU.Debug\System.Data\netcoreapp\System.Data.dll
04:24:56   Microsoft.NETCore.Platforms -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/Microsoft.NETCore.Platforms.nuspec
04:24:56   Microsoft.NETCore.Targets -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/Microsoft.NETCore.Targets.nuspec
04:24:58   Microsoft.Private.CoreFx.NETCoreApp -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/Microsoft.Private.CoreFx.NETCoreApp.nuspec
04:24:58   Verifying closure of Microsoft.Private.CoreFx.NETCoreApp reference assemblies
04:24:58   Verifying no duplicate types in Microsoft.Private.CoreFx.NETCoreApp reference assemblies
04:25:16   Microsoft.Private.CoreFx.NETCoreApp -> D:\j\workspace\windows-TGrou---74aa877a\bin/packages/Debug/specs/runtime.win-x64.Microsoft.Private.CoreFx.NETCoreApp.nuspec
04:25:16   Verifying closure of runtime.win-x64.Microsoft.Private.CoreFx.NETCoreApp runtime assemblies
04:25:16 D:\j\workspace\windows-TGrou---74aa877a\pkg\frameworkPackage.targets(124,5): error : Assembly 'System.Numerics.Vectors' is missing dependency 'System.Runtime.CompilerServices.Unsafe' [D:\j\workspace\windows-TGrou---74aa877a\pkg\Microsoft.Private.CoreFx.NETCoreApp\Microsoft.Private.CoreFx.NETCoreApp.pkgproj]
04:25:16 
04:25:16 Build FAILED.
04:25:16 
04:25:16 D:\j\workspace\windows-TGrou---74aa877a\pkg\frameworkPackage.targets(124,5): error : Assembly 'System.Numerics.Vectors' is missing dependency 'System.Runtime.CompilerServices.Unsafe' [D:\j\workspace\windows-TGrou---74aa877a\pkg\Microsoft.Private.CoreFx.NETCoreApp\Microsoft.Private.CoreFx.NETCoreApp.pkgproj]

jkotas · 2018-01-23T16:57:31Z

System.Numerics.Vectors is inbox. System.Runtime.CompilerServices.Unsafe is out of box. Inbox cannot depend on out of box.

jkotas · 2018-01-23T18:09:32Z

src/System.Numerics.Vectors/src/System.Numerics.Vectors.csproj

+  <ItemGroup Condition="'$(TargetGroup)' == 'netcoreapp'">
+    <Compile Include="System\Numerics\Quaternion.netcoreapp.cs" />
+    <Compile Include="$(CommonPath)\CoreLib\Internal\Runtime\CompilerServices\Unsafe.cs">
+        <Link>Common\CoreLib\Internal\Runtime\CompilerServices\Unsafe.cs</Link>


You need to reference the internal Unsafe in CoreLib. Local copy is not going to work - it won't be recognized by the JIT.

Like this?

<ItemGroup Condition="'$(TargetGroup)' == 'netcoreapp'"> <Compile Include="System\Numerics\Quaternion.netcoreapp.cs" /> <ReferenceFromRuntime Include="System.Private.CoreLib" /> </ItemGroup> <ItemGroup Condition="'$(IsPartialFacadeAssembly)' != 'true' AND '$(TargetGroup)' != 'netcoreapp'"> <Compile Include="System\Numerics\Quaternion.cs" /> </ItemGroup>

Yes, something like this.

benaadams · 2018-01-23T21:21:01Z

NETFX System.Drawing.Common.Tests

\src\System.Drawing.Common\tests\System.Drawing.Common.Tests.csproj]
 warning MSB3073: The command "D:\j\workspace\windows-TGrou---2a8f9c29\bin/tests/System.Drawing.Common.Tests
  /netfx-Windows_NT-Release-x86//RunTests.cmd D:\j\workspace\windows-TGrou---2a8f9c29\bin/testhost/netfx-Windows_NT-Release-x86/"
  exited with code 1. [D:\j\workspace\windows-TGrou---2a8f9c29
  \src\System.Drawing.Common\tests\System.Drawing.Common.Tests.csproj]
 error : One or more tests failed while running tests from 'System.Drawing.Common.Tests' please check 
 D:\j\workspace\windows-TGrou---2a8f9c29\bin/tests/System.Drawing.Common.Tests/netfx-Windows_NT-Release-x86/testResults.xml for details! 
 [D:\j\workspace\windows-TGrou---2a8f9c29\src\System.Drawing.Common\tests\System.Drawing.Common.Tests.csproj]
: error : (No message specified) [D:\j\workspace\windows-TGrou---2a8f9c29\src\tests.builds]

test NETFX x86 Release Build

jkotas · 2018-01-23T21:25:13Z

src/System.Numerics.Vectors/src/System.Numerics.Vectors.csproj

      <Link>System\MathF.netstandard.cs</Link>
    </Compile>
  </ItemGroup>
+  <!-- Optimize Quaternion as Vector4 for netcoreapp -->
+  <!-- Jit issue for other runtimes https://github.com/dotnet/coreclr/issues/15237 -->
+  <ItemGroup Condition="'$(IsPartialFacadeAssembly)' != 'true' AND $(TargetGroup.StartsWith('netcoreapp2'))">


CoreLib reference can be used for live netcoreapp only.

Always gives me this when using netcoreapp

C:\GitHub\corefx\buildvertical.targets(168,5): error : Could not find a configuration for ProjectReference 'C:\GitHub\corefx\\external\runtime\runtime.depproj' from configurations netcoreapp-Windows_NT; netcoreapp-Unix; netcoreapp2.0-Windows_NT; netcoreapp2.0-Unix; uap; uapaot; mono when building 'System.Numerics.Vectors' for configuration netcoreapp [C:\GitHub\corefx\src\System.Numerics.Vectors\src\System.Numerics.Vectors.csproj]

benaadams · 2018-02-14T12:52:51Z

Should be intrinisic?

[Intrinsic]
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Vector4 operator /(Vector4 value1, float value2)

benaadams · 2018-02-14T12:58:02Z

Unless moving some of Vector to coreclr broke the Intriniscs; or its a bad implementation of it?

benaadams · 2018-02-14T13:11:01Z

Doesn't look like operator /(Vector4 value1, float value2) is an intrinsic; changing to

public static Quaternion Normalize(Quaternion value)
{
    Vector4 q = Unsafe.As<Quaternion, Vector4>(ref value);

    float length = q.Length();
    Vector4 v = q / new Vector4(length);
    return Unsafe.As<Vector4, Quaternion>(ref v);
}

evaporates the division inlines

Inlines into 06000003 Program:Normalize_New():struct
  [1 IL=0020 TR=000011 06000006] [profitable inline] QuaternionStruct:.ctor(float,float,float,float):this
  [2 IL=0025 TR=000018 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [3 IL=0002 TR=000092 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [4 IL=0015 TR=000103 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [5 IL=0036 TR=000134 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
  [6 IL=0030 TR=000024 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [7 IL=0002 TR=000196 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [8 IL=0015 TR=000207 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [9 IL=0036 TR=000238 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
  [10 IL=0035 TR=000035 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [11 IL=0002 TR=000300 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [12 IL=0015 TR=000311 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [13 IL=0036 TR=000342 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
  [14 IL=0040 TR=000046 06000010] [aggressive inline attribute] QuaternionStruct:Normalize(struct):struct
    [15 IL=0002 TR=000404 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
    [16 IL=0015 TR=000415 0600010A] [aggressive inline attribute] Vector4:Length():float:this
    [17 IL=0036 TR=000446 06000016] [aggressive inline attribute] Unsafe:As(byref):byref
Budget: initialTime=198, finalTime=1188, initialBudget=1980, currentBudget=3004
Budget: increased by 1024 because of force inlines
Budget: initialSize=1180, finalSize=1331
; Assembly listing for method Program:Normalize_New():struct
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 RetBuf       [V00,T01] (  4,  4   )   byref  ->  rcx        
;* V01 loc0         [V01    ] (  0,  0   )  struct (16) zero-ref   
;  V02 tmp1         [V02,T06] (  2,  4   )  struct (16) [rsp+0xA8]   do-not-enreg[SB]
;  V03 tmp2         [V03,T07] (  2,  4   )  struct (16) [rsp+0x98]   do-not-enreg[SB]
;  V04 tmp3         [V04,T08] (  2,  4   )  struct (16) [rsp+0x88]   do-not-enreg[SB]
;  V05 tmp4         [V05    ] (  2,  4   )  struct (16) [rsp+0x78]   do-not-enreg[XSVB] addr-exposed ld-addr-op
;  V06 tmp5         [V06,T02] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V07 tmp6         [V07    ] (  0,  0   )   float  ->  zero-ref   
;  V08 tmp7         [V08,T09] (  2,  4   )  simd16  ->  mm1        
;* V09 tmp8         [V09    ] (  0,  0   )  simd16  ->  zero-ref   
;  V10 tmp9         [V10,T16] (  2,  2   )  simd16  ->  [rsp+0x60]   do-not-enreg[SB] ld-addr-op
;  V11 tmp10        [V11,T20] (  2,  2   )   float  ->  mm1        
;  V12 tmp11        [V12,T21] (  2,  2   )   float  ->  mm1        
;  V13 tmp12        [V13,T10] (  2,  4   )  struct (16) [rsp+0x50]   do-not-enreg[SVB] ld-addr-op
;  V14 tmp13        [V14,T03] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V15 tmp14        [V15    ] (  0,  0   )   float  ->  zero-ref   
;  V16 tmp15        [V16,T11] (  2,  4   )  simd16  ->  mm1        
;* V17 tmp16        [V17    ] (  0,  0   )  simd16  ->  zero-ref   
;  V18 tmp17        [V18,T17] (  2,  2   )  simd16  ->  [rsp+0x40]   do-not-enreg[SB] ld-addr-op
;  V19 tmp18        [V19,T22] (  2,  2   )   float  ->  mm1        
;  V20 tmp19        [V20,T23] (  2,  2   )   float  ->  mm1        
;  V21 tmp20        [V21,T12] (  2,  4   )  struct (16) [rsp+0x30]   do-not-enreg[SVB] ld-addr-op
;  V22 tmp21        [V22,T04] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V23 tmp22        [V23    ] (  0,  0   )   float  ->  zero-ref   
;  V24 tmp23        [V24,T13] (  2,  4   )  simd16  ->  mm1        
;* V25 tmp24        [V25    ] (  0,  0   )  simd16  ->  zero-ref   
;  V26 tmp25        [V26,T18] (  2,  2   )  simd16  ->  [rsp+0x20]   do-not-enreg[SB] ld-addr-op
;  V27 tmp26        [V27,T24] (  2,  2   )   float  ->  mm1        
;  V28 tmp27        [V28,T25] (  2,  2   )   float  ->  mm1        
;  V29 tmp28        [V29,T14] (  2,  4   )  struct (16) [rsp+0x10]   do-not-enreg[SVB] ld-addr-op
;  V30 tmp29        [V30,T05] (  4,  4   )  simd16  ->  mm0         ld-addr-op
;* V31 tmp30        [V31    ] (  0,  0   )   float  ->  zero-ref   
;  V32 tmp31        [V32,T15] (  2,  4   )  simd16  ->  mm1        
;* V33 tmp32        [V33    ] (  0,  0   )  simd16  ->  zero-ref   
;  V34 tmp33        [V34,T19] (  2,  2   )  simd16  ->  [rsp+0x00]   do-not-enreg[SB] ld-addr-op
;  V35 tmp34        [V35,T26] (  2,  2   )   float  ->  mm1        
;  V36 tmp35        [V36,T27] (  2,  2   )   float  ->  mm1        
;  V37 tmp36        [V37,T28] (  2,  2   )   float  ->  mm0         V01.X(offs=0x00) P-INDEP
;  V38 tmp37        [V38,T29] (  2,  2   )   float  ->  mm1         V01.Y(offs=0x04) P-INDEP
;  V39 tmp38        [V39,T30] (  2,  2   )   float  ->  mm2         V01.Z(offs=0x08) P-INDEP
;  V40 tmp39        [V40,T31] (  2,  2   )   float  ->  mm3         V01.W(offs=0x0c) P-INDEP
;  V41 tmp40        [V41,T00] (  5, 10   )   byref  ->  rax         stack-byref
;# V42 OutArgs      [V42    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 184

G_M39223_IG01:
       4881ECB8000000       sub      rsp, 184
       C5F877               vzeroupper 

G_M39223_IG02:
       C4E17A10055D010000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100D58010000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A101553010000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101D4E010000   vmovss   xmm3, dword ptr [reloc @RWD12]
       488D442478           lea      rax, bword ptr [rsp+78H]
       C4E17A1100           vmovss   dword ptr [rax], xmm0
       C4E17A114804         vmovss   dword ptr [rax+4], xmm1
       C4E17A115008         vmovss   dword ptr [rax+8], xmm2
       C4E17A11580C         vmovss   dword ptr [rax+12], xmm3
       C4E17910442478       vmovupd  xmm0, xmmword ptr [rsp+78H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442460       vmovapd  xmmword ptr [rsp+60H], xmm0
       C4E17A6F442460       vmovdqu  xmm0, qword ptr [rsp+60H]
       C4E17A7F8424A8000000 vmovdqu  qword ptr [rsp+A8H], xmm0
       C4E17A6F8424A8000000 vmovdqu  xmm0, qword ptr [rsp+A8H]
       C4E17A7F442450       vmovdqu  qword ptr [rsp+50H], xmm0
       C4E17910442450       vmovupd  xmm0, xmmword ptr [rsp+50H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442440       vmovapd  xmmword ptr [rsp+40H], xmm0
       C4E17A6F442440       vmovdqu  xmm0, qword ptr [rsp+40H]
       C4E17A7F842498000000 vmovdqu  qword ptr [rsp+98H], xmm0
       C4E17A6F842498000000 vmovdqu  xmm0, qword ptr [rsp+98H]
       C4E17A7F442430       vmovdqu  qword ptr [rsp+30H], xmm0
       C4E17910442430       vmovupd  xmm0, xmmword ptr [rsp+30H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
       C4E17A6F442420       vmovdqu  xmm0, qword ptr [rsp+20H]
       C4E17A7F842488000000 vmovdqu  qword ptr [rsp+88H], xmm0
       C4E17A6F842488000000 vmovdqu  xmm0, qword ptr [rsp+88H]
       C4E17A7F442410       vmovdqu  qword ptr [rsp+10H], xmm0
       C4E17910442410       vmovupd  xmm0, xmmword ptr [rsp+10H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E179290424         vmovapd  xmmword ptr [rsp], xmm0
       C4E17A6F0424         vmovdqu  xmm0, qword ptr [rsp]
       C4E17A7F01           vmovdqu  qword ptr [rcx], xmm0
       488BC1               mov      rax, rcx

G_M39223_IG03:
       4881C4B8000000       add      rsp, 184
       C3                   ret      

; Total bytes of code 357, prolog size 10 for method Program:Normalize_New():struct

And goes a little faster

            Method |     Mean |     Error |    StdDev |
------------------ |---------:|----------:|----------:|
 Normalize_Current | 63.99 ns | 0.1399 ns | 0.1169 ns |
     Normalize_New | 67.61 ns | 0.3144 ns | 0.2941 ns |
 Normalize_Vector4 | 33.75 ns | 0.0877 ns | 0.0685 ns |

Which suggests Vector4 operator should be changed to use / new Vector4?

benaadams · 2018-02-14T13:13:27Z

Some of the asm is a bit redundant though?

       C4E17A7F8424A8000000 vmovdqu  qword ptr [rsp+A8H], xmm0
       C4E17A6F8424A8000000 vmovdqu  xmm0, qword ptr [rsp+A8H]

mikedn · 2018-02-14T13:51:21Z

Should be intrinisic?

It doesn't seem to be necessary. The fundamental problem seems to be that the current implementation of Vector4.operator/(Vector4, float) is unfortunate:

float invDiv = 1.0f / value2;
return new Vector4(value1.X * invDiv, value1.Y * invDiv, value1.Z * invDiv, value1.W * invDiv);

This should be

return value1 / new Vector4(value2);

that would give you a broadcast/shuffle + divps.
Or, if you want to be preserve the current numeric result, keep the scalar division but do vector multiplication:

return value1 * (1.0f / value2);

that would give you divss + broadcast/shuffle + mulps. But this approach is kind of lame. It's slower on current hardware and it's also less precise. Changing x / y into x * (1 / y) is not a "correct" FP optimization. It's something that people do when they are willing to trade accuracy for performance. In this case you get neither.

Some of the asm is a bit redundant though?

So it seems. Could be an inlining artifact, sometimes it generates copies that aren't removed by subsequent phases. Or it's an unfortunate side effect of using ref.

benaadams · 2018-02-14T14:34:36Z

Should be intrinisic?

It doesn't seem to be necessary.

Issue: https://github.com/dotnet/coreclr/issues/16385

Workaround: #27122

ahsonkhan · 2018-03-10T02:17:26Z

@benaadams, what is the status of this PR? Any updates?

benaadams · 2018-03-10T07:20:00Z

I've been kidnapped for 2 weeks

karelz · 2018-03-18T19:14:02Z

@benaadams how is this week treating you? 😉

karelz · 2018-03-18T19:21:53Z

BTW: If the change is considered "risky" by area owners, we might need to wait for master branch being reopen for post-2.1 work. (2-3 weeks)

benaadams · 2018-03-25T02:58:48Z

Back on it

benaadams · 2018-03-25T05:02:14Z

Windows x86 Release Build failure https://github.com/dotnet/corefx/issues/28453

benaadams · 2018-03-25T05:34:26Z

Still not good 😢

            Method |     Mean |     Error |    StdDev |   Median |
------------------ |---------:|----------:|----------:|---------:|
 Normalize_Current | 62.70 ns | 1.2146 ns | 1.1362 ns | 63.63 ns |
     Normalize_New | 67.46 ns | 0.1043 ns | 0.0815 ns | 67.46 ns |
 Normalize_Vector4 | 15.90 ns | 0.3118 ns | 0.2917 ns | 16.05 ns |

public static Quaternion Normalize_Current()
{
    Quaternion start = new Quaternion(8.5f, 9.4f, 1.2f, 1f);

    Quaternion c1 = Quaternion.Normalize(start);
    Quaternion c2 = Quaternion.Normalize(c1);
    Quaternion c3 = Quaternion.Normalize(c2);
    return Quaternion.Normalize(c3);
}

public static QuaternionStruct Normalize_New()
{
    QuaternionStruct start = new QuaternionStruct(8.5f, 9.4f, 1.2f, 1f);

    QuaternionStruct c1 = QuaternionStruct.Normalize(start);
    QuaternionStruct c2 = QuaternionStruct.Normalize(c1);
    QuaternionStruct c3 = QuaternionStruct.Normalize(c2);
    return QuaternionStruct.Normalize(c3);
}

public static Vector4 Normalize_Vector4()
{
    Vector4 start = new Vector4(8.5f, 9.4f, 1.2f, 1f);

    Vector4 c1 = Vector4.Normalize(start);
    Vector4 c2 = Vector4.Normalize(c1);
    Vector4 c3 = Vector4.Normalize(c2);
    return Vector4.Normalize(c3);
}

; Assembly listing for method Program:Normalize_Vector4():struct
; ...
; Lcl frame size = 0

G_M3011_IG01:
       C5F877               vzeroupper 

G_M3011_IG02:
       C4E17A1005CC000000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100DC7000000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A1015C2000000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101DBD000000   vmovss   xmm3, dword ptr [reloc @RWD12]
       C4E15857E4           vxorps   xmm4, xmm4
       C4E15A10E3           vmovss   xmm4, xmm4, xmm3
       C4E15973FC04         vpslldq  xmm4, 4
       C4E15A10E2           vmovss   xmm4, xmm4, xmm2
       C4E15973FC04         vpslldq  xmm4, 4
       C4E15A10E1           vmovss   xmm4, xmm4, xmm1
       C4E15973FC04         vpslldq  xmm4, 4
       C4E15A10E0           vmovss   xmm4, xmm4, xmm0
       C4E17828C4           vmovaps  xmm0, xmm4
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E1791101           vmovupd  xmmword ptr [rcx], xmm0
       488BC1               mov      rax, rcx

G_M3011_IG03:
       C3                   ret      

; Total bytes of code 200, prolog size 3 for method Program:Normalize_Vector4():struct

; Assembly listing for method Program:Normalize_New():struct
;
;  V00 RetBuf       [V00,T05] (  4,  4   )   byref  ->  rcx        
;* V01 loc0         [V01    ] (  0,  0   )  struct (16) zero-ref   
;  V02 tmp1         [V02,T06] (  2,  4   )  struct (16) [rsp+0xA8]   do-not-enreg[SB]
;  V03 tmp2         [V03,T07] (  2,  4   )  struct (16) [rsp+0x98]   do-not-enreg[SB]
;  V04 tmp3         [V04,T08] (  2,  4   )  struct (16) [rsp+0x88]   do-not-enreg[SB]
;  V05 tmp4         [V05    ] (  2,  4   )  struct (16) [rsp+0x78]   do-not-enreg[XSVB] addr-exposed ld-addr-op
;  V06 tmp5         [V06,T16] (  2,  2   )  simd16  ->  [rsp+0x60]   do-not-enreg[SB] ld-addr-op
;  V07 tmp6         [V07,T17] (  2,  2   )  simd16  ->  mm0        
;  V08 tmp7         [V08,T01] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V09 tmp8         [V09    ] (  0,  0   )   float  ->  zero-ref   
;  V10 tmp9         [V10,T24] (  2,  2   )   float  ->  mm1        
;  V11 tmp10        [V11,T25] (  2,  2   )   float  ->  mm1        
;* V12 tmp11        [V12    ] (  0,  0   )  simd16  ->  zero-ref   
;  V13 tmp12        [V13,T09] (  2,  4   )  simd16  ->  mm1        
;  V14 tmp13        [V14,T10] (  2,  4   )  struct (16) [rsp+0x50]   do-not-enreg[SVB] ld-addr-op
;  V15 tmp14        [V15,T18] (  2,  2   )  simd16  ->  [rsp+0x40]   do-not-enreg[SB] ld-addr-op
;  V16 tmp15        [V16,T19] (  2,  2   )  simd16  ->  mm0        
;  V17 tmp16        [V17,T02] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V18 tmp17        [V18    ] (  0,  0   )   float  ->  zero-ref   
;  V19 tmp18        [V19,T26] (  2,  2   )   float  ->  mm1        
;  V20 tmp19        [V20,T27] (  2,  2   )   float  ->  mm1        
;* V21 tmp20        [V21    ] (  0,  0   )  simd16  ->  zero-ref   
;  V22 tmp21        [V22,T11] (  2,  4   )  simd16  ->  mm1        
;  V23 tmp22        [V23,T12] (  2,  4   )  struct (16) [rsp+0x30]   do-not-enreg[SVB] ld-addr-op
;  V24 tmp23        [V24,T20] (  2,  2   )  simd16  ->  [rsp+0x20]   do-not-enreg[SB] ld-addr-op
;  V25 tmp24        [V25,T21] (  2,  2   )  simd16  ->  mm0        
;  V26 tmp25        [V26,T03] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V27 tmp26        [V27    ] (  0,  0   )   float  ->  zero-ref   
;  V28 tmp27        [V28,T28] (  2,  2   )   float  ->  mm1        
;  V29 tmp28        [V29,T29] (  2,  2   )   float  ->  mm1        
;* V30 tmp29        [V30    ] (  0,  0   )  simd16  ->  zero-ref   
;  V31 tmp30        [V31,T13] (  2,  4   )  simd16  ->  mm1        
;  V32 tmp31        [V32,T14] (  2,  4   )  struct (16) [rsp+0x10]   do-not-enreg[SVB] ld-addr-op
;  V33 tmp32        [V33,T22] (  2,  2   )  simd16  ->  [rsp+0x00]   do-not-enreg[SB] ld-addr-op
;  V34 tmp33        [V34,T23] (  2,  2   )  simd16  ->  mm0        
;  V35 tmp34        [V35,T04] (  4,  8   )  simd16  ->  mm0         ld-addr-op
;* V36 tmp35        [V36    ] (  0,  0   )   float  ->  zero-ref   
;  V37 tmp36        [V37,T30] (  2,  2   )   float  ->  mm1        
;  V38 tmp37        [V38,T31] (  2,  2   )   float  ->  mm1        
;* V39 tmp38        [V39    ] (  0,  0   )  simd16  ->  zero-ref   
;  V40 tmp39        [V40,T15] (  2,  4   )  simd16  ->  mm1        
;  V41 tmp40        [V41,T32] (  2,  2   )   float  ->  mm0         V01.X(offs=0x00) P-INDEP
;  V42 tmp41        [V42,T33] (  2,  2   )   float  ->  mm1         V01.Y(offs=0x04) P-INDEP
;  V43 tmp42        [V43,T34] (  2,  2   )   float  ->  mm2         V01.Z(offs=0x08) P-INDEP
;  V44 tmp43        [V44,T35] (  2,  2   )   float  ->  mm3         V01.W(offs=0x0c) P-INDEP
;  V45 tmp44        [V45,T00] (  5, 10   )   byref  ->  rax         stack-byref
;# V46 OutArgs      [V46    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 184

G_M39231_IG01:
       4881ECB8000000       sub      rsp, 184
       C5F877               vzeroupper 

G_M39231_IG02:
       C4E17A10055D010000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100D58010000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A101553010000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101D4E010000   vmovss   xmm3, dword ptr [reloc @RWD12]
       488D442478           lea      rax, bword ptr [rsp+78H]
       C4E17A1100           vmovss   dword ptr [rax], xmm0
       C4E17A114804         vmovss   dword ptr [rax+4], xmm1
       C4E17A115008         vmovss   dword ptr [rax+8], xmm2
       C4E17A11580C         vmovss   dword ptr [rax+12], xmm3
       C4E17910442478       vmovupd  xmm0, xmmword ptr [rsp+78H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442460       vmovapd  xmmword ptr [rsp+60H], xmm0
       C4E17A6F442460       vmovdqu  xmm0, qword ptr [rsp+60H]
       C4E17A7F8424A8000000 vmovdqu  qword ptr [rsp+A8H], xmm0
       C4E17A6F8424A8000000 vmovdqu  xmm0, qword ptr [rsp+A8H]
       C4E17A7F442450       vmovdqu  qword ptr [rsp+50H], xmm0
       C4E17910442450       vmovupd  xmm0, xmmword ptr [rsp+50H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442440       vmovapd  xmmword ptr [rsp+40H], xmm0
       C4E17A6F442440       vmovdqu  xmm0, qword ptr [rsp+40H]
       C4E17A7F842498000000 vmovdqu  qword ptr [rsp+98H], xmm0
       C4E17A6F842498000000 vmovdqu  xmm0, qword ptr [rsp+98H]
       C4E17A7F442430       vmovdqu  qword ptr [rsp+30H], xmm0
       C4E17910442430       vmovupd  xmm0, xmmword ptr [rsp+30H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
       C4E17A6F442420       vmovdqu  xmm0, qword ptr [rsp+20H]
       C4E17A7F842488000000 vmovdqu  qword ptr [rsp+88H], xmm0
       C4E17A6F842488000000 vmovdqu  xmm0, qword ptr [rsp+88H]
       C4E17A7F442410       vmovdqu  qword ptr [rsp+10H], xmm0
       C4E17910442410       vmovupd  xmm0, xmmword ptr [rsp+10H]
       C4E17828C8           vmovaps  xmm1, xmm0
       C4E37140C8F1         vdpps    xmm1, xmm0, 241
       C4E17251C9           vsqrtss  xmm1, xmm1
       C4E27918C9           vbroadcastss xmm1, xmm1
       C4E1785EC1           vdivps   xmm0, xmm1
       C4E179290424         vmovapd  xmmword ptr [rsp], xmm0
       C4E17A6F0424         vmovdqu  xmm0, qword ptr [rsp]
       C4E17A7F01           vmovdqu  qword ptr [rcx], xmm0
       488BC1               mov      rax, rcx

G_M39231_IG03:
       4881C4B8000000       add      rsp, 184
       C3                   ret      

; Total bytes of code 357, prolog size 10 for method Program:Normalize_New():struct

; Assembly listing for method Program:Normalize_Current():struct
;...
;  V00 RetBuf       [V00,T01] (  4,  4   )   byref  ->  rsi        
;* V01 loc0         [V01    ] (  0,  0   )  struct (16) zero-ref   
;  V02 tmp1         [V02    ] (  2,  4   )  struct (16) [rsp+0x50]   do-not-enreg[XSB] addr-exposed
;  V03 tmp2         [V03    ] (  2,  4   )  struct (16) [rsp+0x40]   do-not-enreg[XSB] addr-exposed
;  V04 tmp3         [V04    ] (  2,  4   )  struct (16) [rsp+0x30]   do-not-enreg[XSB] addr-exposed
;  V05 tmp4         [V05,T06] (  2,  2   )   float  ->  mm0         V01.X(offs=0x00) P-INDEP
;  V06 tmp5         [V06,T07] (  2,  2   )   float  ->  mm1         V01.Y(offs=0x04) P-INDEP
;  V07 tmp6         [V07,T08] (  2,  2   )   float  ->  mm2         V01.Z(offs=0x08) P-INDEP
;  V08 tmp7         [V08,T09] (  2,  2   )   float  ->  mm3         V01.W(offs=0x0c) P-INDEP
;  V09 tmp8         [V09    ] ( 12, 24   )  struct (16) [rsp+0x20]   do-not-enreg[XSB] addr-exposed
;  V10 tmp9         [V10,T00] (  5, 10   )   byref  ->  rdx         stack-byref
;  V11 tmp10        [V11,T03] (  2,  4   )    long  ->  rcx        
;  V12 tmp11        [V12,T04] (  2,  4   )    long  ->  rcx        
;  V13 tmp12        [V13,T05] (  2,  4   )    long  ->  rcx        
;  V14 tmp13        [V14,T02] (  2,  4   )   byref  ->  rcx        
;  V15 OutArgs      [V15    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 96

G_M20468_IG01:
       56                   push     rsi
       4883EC60             sub      rsp, 96
       C5F877               vzeroupper 
       488BF1               mov      rsi, rcx

G_M20468_IG02:
       C4E17A1005A4000000   vmovss   xmm0, dword ptr [reloc @RWD00]
       C4E17A100D9F000000   vmovss   xmm1, dword ptr [reloc @RWD04]
       C4E17A10159A000000   vmovss   xmm2, dword ptr [reloc @RWD08]
       C4E17A101D95000000   vmovss   xmm3, dword ptr [reloc @RWD12]
       488D4C2450           lea      rcx, bword ptr [rsp+50H]
       488D542420           lea      rdx, bword ptr [rsp+20H]
       C4E17A1102           vmovss   dword ptr [rdx], xmm0
       C4E17A114A04         vmovss   dword ptr [rdx+4], xmm1
       C4E17A115208         vmovss   dword ptr [rdx+8], xmm2
       C4E17A115A0C         vmovss   dword ptr [rdx+12], xmm3
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E8BEFBFFFF           call     Quaternion:Normalize(struct):struct
       488D4C2440           lea      rcx, bword ptr [rsp+40H]
       C4E17A6F442450       vmovdqu  xmm0, qword ptr [rsp+50H]
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], xmm0
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E8A1FBFFFF           call     Quaternion:Normalize(struct):struct
       488D4C2430           lea      rcx, bword ptr [rsp+30H]
       C4E17A6F442440       vmovdqu  xmm0, qword ptr [rsp+40H]
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], xmm0
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E884FBFFFF           call     Quaternion:Normalize(struct):struct
       488BCE               mov      rcx, rsi
       C4E17A6F442430       vmovdqu  xmm0, qword ptr [rsp+30H]
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], xmm0
       488D542420           lea      rdx, bword ptr [rsp+20H]
       E869FBFFFF           call     Quaternion:Normalize(struct):struct
       488BC6               mov      rax, rsi

G_M20468_IG03:
       4883C460             add      rsp, 96
       5E                   pop      rsi
       C3                   ret      

; Total bytes of code 184, prolog size 8 for method Program:Normalize_Current():struct

benaadams · 2018-03-25T05:35:11Z

Going to give up for now 😞

benaadams · 2018-03-25T05:38:07Z

Difference between Vector4 and the Quaternion cast to Vector4 is mainly these blocks I think:

       C4E17929442420       vmovapd  xmmword ptr [rsp+20H], xmm0
       C4E17A6F442420       vmovdqu  xmm0, qword ptr [rsp+20H]
       C4E17A7F842488000000 vmovdqu  qword ptr [rsp+88H], xmm0
       C4E17A6F842488000000 vmovdqu  xmm0, qword ptr [rsp+88H]
       C4E17A7F442410       vmovdqu  qword ptr [rsp+10H], xmm0
       C4E17910442410       vmovupd  xmm0, xmmword ptr [rsp+10H]

benaadams · 2018-03-25T05:57:42Z

Raised issue https://github.com/dotnet/coreclr/issues/17207

eerhardt · 2018-03-28T15:55:39Z

@benaadams - did you intend to reopen this PR? I see you gave up for now and closed it. But then re-opened it the same day.

I just want to verify if this PR should be opened or closed.

benaadams · 2018-03-28T20:33:07Z

Closed it; then opened issue in coreclr; and reopened in hope :)

I'd like to Quaternion to be vectorized, but I also don't want to make it worse along the way...

eerhardt · 2018-03-28T20:34:54Z

Note that the coreclr issue was moved to Future, which means it is unlikely to be fixed in .NET Core 2.1.

benaadams · 2018-03-29T03:19:58Z

Added PR for the test changes in this PR #28582

Perhaps something to revisit with CPU intrinsics rather than Vector4

benaadams commented Nov 26, 2017

View reviewed changes

benaadams force-pushed the quaternions branch from 61c01ba to 5d63a4a Compare November 26, 2017 22:51

benaadams force-pushed the quaternions branch 2 times, most recently from 292f1fe to f86df6c Compare November 27, 2017 04:18

eerhardt reviewed Nov 27, 2017

View reviewed changes

karelz added the area-System.Numerics label Nov 27, 2017

karelz assigned benaadams, ViktorHofer and eerhardt Nov 27, 2017

eerhardt reviewed Nov 27, 2017

View reviewed changes

benaadams force-pushed the quaternions branch from 9bccfcf to a3b70b9 Compare January 3, 2018 19:39

benaadams force-pushed the quaternions branch 2 times, most recently from d414e86 to 6c3a6d7 Compare January 23, 2018 12:08

jkotas reviewed Jan 23, 2018

View reviewed changes

Optimize Quaternion as Vector4 for netcoreapp

a48fb54

benaadams force-pushed the quaternions branch from cbad486 to a48fb54 Compare March 25, 2018 02:57

benaadams closed this Mar 25, 2018

benaadams reopened this Mar 25, 2018

benaadams mentioned this pull request Mar 29, 2018

Improve Quaternion test failure messages #28582

Merged

benaadams closed this Mar 29, 2018

karelz added this to the 2.1.0 milestone Mar 30, 2018

Vectorize Quaternion #25510

Vectorize Quaternion #25510

Conversation

benaadams commented Nov 26, 2017

benaadams Nov 26, 2017 • edited Loading

Choose a reason for hiding this comment

benaadams commented Nov 27, 2017

benaadams commented Nov 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams commented Nov 27, 2017

CarolEidt commented Nov 27, 2017

CarolEidt commented Nov 27, 2017

jkotas commented Nov 28, 2017

karelz commented Jan 3, 2018

benaadams commented Jan 3, 2018

karelz commented Jan 18, 2018

benaadams commented Jan 18, 2018

eerhardt commented Jan 18, 2018

benaadams commented Jan 23, 2018

benaadams commented Jan 23, 2018

jkotas commented Jan 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams commented Jan 23, 2018

jkotas Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams commented Feb 14, 2018

benaadams commented Feb 14, 2018

benaadams commented Feb 14, 2018

benaadams commented Feb 14, 2018

mikedn commented Feb 14, 2018

benaadams commented Feb 14, 2018

ahsonkhan commented Mar 10, 2018

benaadams commented Mar 10, 2018

karelz commented Mar 18, 2018

karelz commented Mar 18, 2018

benaadams commented Mar 25, 2018

benaadams commented Mar 25, 2018

benaadams commented Mar 25, 2018

benaadams commented Mar 25, 2018

benaadams commented Mar 25, 2018 • edited Loading

benaadams commented Mar 25, 2018

eerhardt commented Mar 28, 2018

benaadams commented Mar 28, 2018

eerhardt commented Mar 28, 2018

benaadams commented Mar 29, 2018

benaadams Nov 26, 2017 •

edited

Loading

benaadams commented Nov 27, 2017 •

edited

Loading

jkotas Jan 23, 2018 •

edited

Loading

benaadams commented Mar 25, 2018 •

edited

Loading