GH-752: Speed up matrix-vector operations #781

CatchemAL · 2017-08-17T06:59:14Z

A note on licensing: These changes are all submitted under the most general licensing agreement that you accept (I think its the MIT license) so feel free to use this code without limitation.

A note on precision: No method has been implemented such that the precision has been degraded by early casting. It is possible to speed up a number of methods if we allow that to happen, but I will submit that code in a separate commit. However, it is the case that some of the numbers will change ever so slightly as a result of floating point arithmetic; the order in which numbers are being summed has changed in some instances and, in general, (a + b) + c does not equal (a + c) + b when floating point numbers are involved.

A minor break: We mentioned on GH-752 that I changed the behaviour of DotWithTranspsosed(...). I had to make a minor change to the KernelPCA in order to get it working.

A note on the build: At the time of writing, this PR is ahead of development and can be automatically merged. The build is currently failing because this file cannot be found: DebuggerVisualizers/resources/disk-black.png". However, the build immediately before I integrated the DebuggerVisualizers passed all tests.

Here is a breakdown of the changes I have made.

Cross-Product

Signature: double[] Cross(this double[] a, double[] b, double[] result)
Unit test: Accord.Tests.Math.MatrixTest.CrossProductTest()
Numerical impact: None
Comments: Addressing issue GH-755. Unit test confirms correctness of cross-product and ensures that result vector is assigned correctly.

Matrix-Matrix

Signature: double[][] Dot(this double[,] a, double[][] b, double[][] result)
Unit test: Accord.Tests.Math.MatrixTest.MultiplyTwoMatrices3()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	14062	6577	x2.1
16	1048576	12768	5324	x2.4
32	131072	11840	4677	x2.5
64	16384	11809	4988	x2.4
128	2048	11462	4762	x2.4
256	256	11455	4637	x2.5
512	32	11365	4632	x2.5
1024	4	12315	5024	x2.5

Signature: double[,] TransposeAndDot(this double[,] a, double[,] b, double[,] result)
Unit test 1: Accord.Tests.Math.MatrixTest.TransposeAndMultiplyTest()
Unit test 2: Accord.Tests.Math.MatrixTest.TransposeAndDotBatchTest()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	14733	7469	x2
16	1048576	13332	5736	x2.3
32	131072	12424	4742	x2.6
64	16384	13087	4861	x2.7
128	2048	13490	4544	x3
256	256	17989	4473	x4
512	32	18799	4698	x4
1024	4	51634	4714	x11

Signature: double[][] DotWithTransposed(this double[][] a, double[][] b, double[][] result)
Unit test 1: Accord.Tests.Math.MatrixTest.DotWithTransposeTest_Jagged()
Unit test 2: Accord.Tests.Math.MatrixTest.DotWithTransposeBatchTest_Jagged()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	9222	4388	x2.1
16	1048576	6203	3868	x1.6
32	131072	5795	3815	x1.5
64	16384	5175	4200	x1.2
128	2048	4850	4297	x1.1
256	256	4615	4109	x1.1
512	32	4803	4347	x1.1
1024	4	5050	4485	x1.1

Signature: double[][] DotWithTransposed(this double[][] a, double[,] b, double[][] result)
Unit test 1: Accord.Tests.Math.MatrixTest.DotWithTransposeTest_Jagged1()
Unit test 2: Accord.Tests.Math.MatrixTest.DotWithTransposeBatchTest_Jagged1()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	11248	4348	x2.6
16	1048576	7600	4134	x1.8
32	131072	6239	4006	x1.6
64	16384	5821	3954	x1.5
128	2048	5242	4077	x1.3
256	256	5066	4121	x1.2
512	32	4973	4117	x1.2
1024	4	5308	4310	x1.2

Signature: double[][] DotWithTransposed(this double[,] a, double[][] b, double[][] result)
Unit test 1: Accord.Tests.Math.MatrixTest.DotWithTransposeTest_Jagged2()
Unit test 2: Accord.Tests.Math.MatrixTest.DotWithTransposeBatchTest_Jagged2()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	15405	4317	x3.6
16	1048576	12709	3877	x3.3
32	131072	11675	4320	x2.7
64	16384	11839	4186	x2.8
128	2048	11389	4162	x2.7
256	256	11339	4293	x2.6
512	32	11458	4247	x2.7
1024	4	11700	4600	x2.5

Matrix-Vector

Signature: double[] Dot(this double[,] matrix, double[] columnVector, double[] result)
Unit test 1: Accord.Tests.Math.MatrixTest.MultiplyMatrixVectorTest()
Unit test 2: Accord.Tests.Math.MatrixTest.MultiplyMatrixVectorBatchTest()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	1437	821	x1.8
16	2097152	1393	630	x2.2
32	524288	1399	560	x2.5
64	131072	1436	564	x2.5
128	32768	1427	554	x2.6
256	8192	1418	543	x2.6
512	2048	1405	573	x2.5
1024	512	1396	575	x2.4

Vector-Matrix-Vector

Signature: double DotAndDot(this double[] rowVector, double[,] matrix, double[] columnVector)
Unit test: Accord.Tests.Math.MatrixTest.DotAndDotTest()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	1917	679	x2.8
16	2097152	1663	491	x3.4
32	524288	1570	461	x3.4
64	131072	1605	439	x3.7
128	32768	1774	436	x4.1
256	8192	2388	442	x5.4
512	2048	2806	467	x6
1024	512	8393	497	x16.9

Outer

Signature: double[,] Outer(this double[] a, double[] b, double[,] result)
Unit test 1: Accord.Tests.Math.MatrixTest.OuterProductTest()
Unit test 2: Accord.Tests.Math.MatrixTest.OuterProductTestDifferentOverloads()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	1227	621	x2
16	2097152	1414	606	x2.3
32	524288	1302	571	x2.3
64	131072	1237	510	x2.4
128	32768	1251	430	x2.9
256	8192	1188	386	x3.1
512	2048	1231	407	x3
1024	512	1179	388	x3

Kronecker (Vectors)

Signature: double[] Kronecker(this double[] a, double[] b, double[] result)
Unit test 1: Accord.Tests.Math.MatrixTest.KroneckerVectorTest()
Unit test 2: Accord.Tests.Math.MatrixTest.KroneckerVectorBatchTest()
Performance impact:

Size	Trials	Accord (ms)	Proposed (ms)	Multiplier
8	8388608	1179	649	x1.8
16	2097152	995	580	x1.7
32	524288	1082	593	x1.8
64	131072	995	600	x1.7
128	32768	1000	697	x1.4
256	8192	1047	737	x1.4
512	2048	1051	735	x1.4
1024	512	1006	753	x1.3

Kronecker (Matrices)

Signature1: double[][] Kronecker(this double[,] a, double[][] b, double[][] result)
Signature2: double[][] Kronecker(this double[][] a, double[,] b, double[][] result)
Signature3: double[][] Kronecker(this double[][] a, double[][] b, double[][] result)
Unit test 1: Accord.Tests.Math.MatrixTest.KroneckerTest()
Unit test 2: Accord.Tests.Math.MatrixTest.KroneckerBatchTest()
Comments: These methods threw NotImplementedException and have now been implemented and tested.
Performance impact: N/A

This closes GH-752

TODO - Investigate performance impact of loop unrolling on floats and decimal types - Handle Stub correctly - thorough testing - Clear error handling on dimension mismatch

…Renamed variables to be consistent with Accord convention. TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch

MAJOR ISSUE TO ADDRESS - Moving to a cache friendly implementation where we move along rows (instead of down columns) introduces complications in the case that the resultant vector is of a less precise type than the input types. Cast before sum results in greater loss of precision than sum followed by cast. Will need to be think about this... TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch

TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch

…tests in a similar fashion to the tests in commit: 4f78e79 Given we are loop unrolling, it is important to check odd and even dimensions separately.

TODO - Add unit tests

…owing a NotImplementedException. Unit tests added as well testing correctness in simple case plus a batch tests versus the current implementation.

…twice as fast now.

…ixes GH-755.

… .NET guidelines.

…alculation. It is common to cache a column in a row-major language like C# for GEMM operations but it is not needed for vector-matrix operations as the cached column is never reused. It is simply assigned and then used. In terms of performance, this is more than twice as fast for N<=128 but the speed improvement gets smaller for large N as the relative cost of allocation decreases.

…T can optimise away the bounds check. https://stackoverflow.com/a/5575316/2375829

… Speedup is quite substantial. Almost twice as fast for small matrices and more than 5 times as fast for large matrices. Important Note: This commit does *** NOT *** result in any loss of precision versus the current implementation. However, the order in which values are summed has changed to respect C# row-major structure. Therefore, the result will sometimes differ very slightly at the double-precision level because for floating point numbers: a + b + c does not always equal a + c + b.

# Conflicts: # Sources/Accord.Math/Matrix/Matrix.Product.tt # Unit Tests/Accord.Tests.Math/Matrix/MatrixTest.cs

…ted for floats sadly.

…ed() method

cesarsouza · 2017-08-17T17:43:14Z

Hi @AlexJCross,

Incredible PR, thanks a lot for all the improvements - I am sure they will be useful for many people!

Regards,
Cesar

CatchemAL added 26 commits August 4, 2017 21:49

GH-752: Beginning work on Matrix-Vector Dot method.

1401233

TODO - Investigate performance impact of loop unrolling on floats and decimal types - Handle Stub correctly - thorough testing - Clear error handling on dimension mismatch

GH-752: Processing remainder correctly for Matrix-Vector Dot method. …

88c1ba1

…Renamed variables to be consistent with Accord convention. TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch

GH-752: Fixing remainder for Matrix-Vector Dot method.

5ca1662

GH-752: Beginning work on Vector-Matrix-Vector Dot method.

f675324

TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch

GH-752: Adding unit tests to the matrix-vector and vector-matrix Dot …

6e6e55b

…tests in a similar fashion to the tests in commit: 4f78e79 Given we are loop unrolling, it is important to check odd and even dimensions separately.

GH-752: Optimising the Outer product

dcae426

TODO - Add unit tests

GH-752: Adding methods for Kronecker product which were currently thr…

a80d4f9

…owing a NotImplementedException. Unit tests added as well testing correctness in simple case plus a batch tests versus the current implementation.

GH-752: Optimising the newly added Kronecker calculations. More than …

83f190f

…twice as fast now.

GH-755: Adding a failing test to illustrate the problem.

7ec314f

Correctly assigning the result in the cross-product calculation. This f…

d6a8054

…ixes GH-755.

Very minor. Changing the order of access modifiers in accordance with…

ce15fc6

… .NET guidelines.

GH-755: Also adding missing this extension

5fe46aa

GH-752: Adding unit test for the new Outer product method

73c1663

GH-752: Simplifying outer product code. Vector pins unnecessary as JI…

78d298f

…T can optimise away the bounds check. https://stackoverflow.com/a/5575316/2375829

GH-752: Optimising Kronecker product for vectors + unit tests.

a247cb5

GH-752: DotWithTransposed optimisations plus unit tests

302bb8e

GH-752: Optimising code for DotWithTranspose(...) and adding unit tests.

c1cc32a

Merge branch 'development' into GH-752

bff5271

# Conflicts: # Sources/Accord.Math/Matrix/Matrix.Product.tt # Unit Tests/Accord.Tests.Math/Matrix/MatrixTest.cs

GH-752: Updating tests and tidying up code.

e5cd602

GH-752: Temporarily removing Vector-Matrix code as precision is impac…

be1e543

…ted for floats sadly.

GH-752: Work complete. Refreshing T4 template.

2018dd1

GH-752: Fixing Kernel PCA after changing behaviour of DotWithTranspos…

ecd96c4

…ed() method

Merge branch 'development' into GH-752

0e42e95

cesarsouza merged commit 94c5ef0 into accord-net:development Aug 17, 2017

CatchemAL deleted the GH-752 branch August 29, 2017 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-752: Speed up matrix-vector operations #781

GH-752: Speed up matrix-vector operations #781

CatchemAL commented Aug 17, 2017

cesarsouza commented Aug 17, 2017

GH-752: Speed up matrix-vector operations #781

GH-752: Speed up matrix-vector operations #781

Conversation

CatchemAL commented Aug 17, 2017

Cross-Product

Matrix-Matrix

Matrix-Vector

Vector-Matrix-Vector

Outer

Kronecker (Vectors)

Kronecker (Matrices)

cesarsouza commented Aug 17, 2017