This repository has been archived by the owner on Nov 19, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2k
GH-752: Speed up matrix-vector operations #781
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TODO - Investigate performance impact of loop unrolling on floats and decimal types - Handle Stub correctly - thorough testing - Clear error handling on dimension mismatch
…Renamed variables to be consistent with Accord convention. TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch
MAJOR ISSUE TO ADDRESS - Moving to a cache friendly implementation where we move along rows (instead of down columns) introduces complications in the case that the resultant vector is of a less precise type than the input types. Cast before sum results in greater loss of precision than sum followed by cast. Will need to be think about this... TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch
TODO - Investigate performance impact of loop unrolling on floats and decimal types - Thorough testing - Clear error handling on dimension mismatch
…tests in a similar fashion to the tests in commit: 4f78e79 Given we are loop unrolling, it is important to check odd and even dimensions separately.
TODO - Add unit tests
…owing a NotImplementedException. Unit tests added as well testing correctness in simple case plus a batch tests versus the current implementation.
…twice as fast now.
… .NET guidelines.
…alculation. It is common to cache a column in a row-major language like C# for GEMM operations but it is not needed for vector-matrix operations as the cached column is never reused. It is simply assigned and then used. In terms of performance, this is more than twice as fast for N<=128 but the speed improvement gets smaller for large N as the relative cost of allocation decreases.
…T can optimise away the bounds check. https://stackoverflow.com/a/5575316/2375829
… Speedup is quite substantial. Almost twice as fast for small matrices and more than 5 times as fast for large matrices. Important Note: This commit does *** NOT *** result in any loss of precision versus the current implementation. However, the order in which values are summed has changed to respect C# row-major structure. Therefore, the result will sometimes differ very slightly at the double-precision level because for floating point numbers: a + b + c does not always equal a + c + b.
# Conflicts: # Sources/Accord.Math/Matrix/Matrix.Product.tt # Unit Tests/Accord.Tests.Math/Matrix/MatrixTest.cs
…ted for floats sadly.
Hi @AlexJCross, Incredible PR, thanks a lot for all the improvements - I am sure they will be useful for many people! Regards, |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @cesarsouza
A note on licensing: These changes are all submitted under the most general licensing agreement that you accept (I think its the MIT license) so feel free to use this code without limitation.
A note on precision: No method has been implemented such that the precision has been degraded by early casting. It is possible to speed up a number of methods if we allow that to happen, but I will submit that code in a separate commit. However, it is the case that some of the numbers will change ever so slightly as a result of floating point arithmetic; the order in which numbers are being summed has changed in some instances and, in general, (a + b) + c does not equal (a + c) + b when floating point numbers are involved.
A minor break: We mentioned on GH-752 that I changed the behaviour of
DotWithTranspsosed(...)
. I had to make a minor change to the KernelPCA in order to get it working.A note on the build: At the time of writing, this PR is ahead of development and can be automatically merged. The build is currently failing because this file cannot be found: DebuggerVisualizers/resources/disk-black.png". However, the build immediately before I integrated the DebuggerVisualizers passed all tests.
Here is a breakdown of the changes I have made.
Cross-Product
Signature:
double[] Cross(this double[] a, double[] b, double[] result)
Unit test:
Accord.Tests.Math.MatrixTest.CrossProductTest()
Numerical impact: None
Comments: Addressing issue GH-755. Unit test confirms correctness of cross-product and ensures that
result
vector is assigned correctly.Matrix-Matrix
Signature:
double[][] Dot(this double[,] a, double[][] b, double[][] result)
Unit test:
Accord.Tests.Math.MatrixTest.MultiplyTwoMatrices3()
Performance impact:
Signature:
double[,] TransposeAndDot(this double[,] a, double[,] b, double[,] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.TransposeAndMultiplyTest()
Unit test 2:
Accord.Tests.Math.MatrixTest.TransposeAndDotBatchTest()
Performance impact:
Signature:
double[][] DotWithTransposed(this double[][] a, double[][] b, double[][] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.DotWithTransposeTest_Jagged()
Unit test 2:
Accord.Tests.Math.MatrixTest.DotWithTransposeBatchTest_Jagged()
Performance impact:
Signature:
double[][] DotWithTransposed(this double[][] a, double[,] b, double[][] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.DotWithTransposeTest_Jagged1()
Unit test 2:
Accord.Tests.Math.MatrixTest.DotWithTransposeBatchTest_Jagged1()
Performance impact:
Signature:
double[][] DotWithTransposed(this double[,] a, double[][] b, double[][] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.DotWithTransposeTest_Jagged2()
Unit test 2:
Accord.Tests.Math.MatrixTest.DotWithTransposeBatchTest_Jagged2()
Performance impact:
Matrix-Vector
Signature:
double[] Dot(this double[,] matrix, double[] columnVector, double[] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.MultiplyMatrixVectorTest()
Unit test 2:
Accord.Tests.Math.MatrixTest.MultiplyMatrixVectorBatchTest()
Performance impact:
Vector-Matrix-Vector
Signature:
double DotAndDot(this double[] rowVector, double[,] matrix, double[] columnVector)
Unit test:
Accord.Tests.Math.MatrixTest.DotAndDotTest()
Performance impact:
Outer
Signature:
double[,] Outer(this double[] a, double[] b, double[,] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.OuterProductTest()
Unit test 2:
Accord.Tests.Math.MatrixTest.OuterProductTestDifferentOverloads()
Performance impact:
Kronecker (Vectors)
Signature:
double[] Kronecker(this double[] a, double[] b, double[] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.KroneckerVectorTest()
Unit test 2:
Accord.Tests.Math.MatrixTest.KroneckerVectorBatchTest()
Performance impact:
Kronecker (Matrices)
Signature1:
double[][] Kronecker(this double[,] a, double[][] b, double[][] result)
Signature2:
double[][] Kronecker(this double[][] a, double[,] b, double[][] result)
Signature3:
double[][] Kronecker(this double[][] a, double[][] b, double[][] result)
Unit test 1:
Accord.Tests.Math.MatrixTest.KroneckerTest()
Unit test 2:
Accord.Tests.Math.MatrixTest.KroneckerBatchTest()
Comments: These methods threw
NotImplementedException
and have now been implemented and tested.Performance impact: N/A
This closes GH-752