-
Notifications
You must be signed in to change notification settings - Fork 2k
Speed up matrix-vector operations #752
Comments
Hi @cesarsouza I have updated the templates for three of the methods - they are named Now that I've been going through the For instance, this test passes... [Test]
public void MultiplyMatrixVectorWithFloats()
{
const int N = 2;
float[,] a = Matrix.Diagonal(N, N, 2.5f);
float[] b = { 2, 2 };
int[] result = new int[N];
a.Dot(b, result);
int[] expected = { 5, 5 };
Assert.IsTrue(result.IsEqual(expected)); // *** Passes (5, 5) ***
} ...but this test fails (which is a very surprising inconsistency). [Test]
public void MultiplyMatrixVectorWithDecimals()
{
const int N = 2;
decimal[,] a = Matrix.Diagonal(N, N, 2.5m);
decimal[] b = { 2, 2 };
int[] result = new int[N];
a.Dot(b, result);
int[] expected = { 5, 5 };
Assert.IsTrue(result.IsEqual(expected)); // *** Fails (4, 4) ***
} This is happening because in one method, the numbers are cast to For now, I have tried to leave all the optimisations 100% consistent with the framework but it is simply not possible to do this for If I could offer my two cents, I think Accord could possibly reduce the number of I'd be really interested to get your thoughts on this. Regards, |
Hi Alex, Thanks a lot for such an amazing work! Please let me comment on a few of the issues you have raised:
Actually, people choose float over double to trade precision for memory, not for speed. The speed of doubles and floats are the often the same, if not faster for double than floats (see this for some quick explanation). Anyways, Intel CPUs (I guess this is a general rule for x86, but I can't remember right now) always perform floating operations using 80-bits for either doubles and floats, so AFAIK, no conversion will take place until the values have to be actually registered into main memory, which should happen only at the end of the computation anyways.
Regarding those other issues, I have to say that the current priority in the framework is to have operations that are as accurate and fast for doubles and floats. The implementations for the others (such as decimals) can be, for the moment, regarded more as convenience methods (that may sometimes not be as optimized, or precise). If you have managed to optimize the execution cases for doubles and floats but found some issues trying to make the methods as correct as possible for decimals, I would say, please submit a pull request anyways and at least document that the current behavior might not be as accurate for some of the other data types. I also think that the issue of converting beforehand to decimals, ints, uint, etc, can be solved with a bit more careful programming in the T4 templates, as you have mentioned. However, this is something that could be totally resolved later (and registered as a separate issue here in the issue tracker). Again, thanks a lot for the excellent effort into developing those optimizations! Regards, |
Hi @cesarsouza , Thanks very much for getting back to me on this.
Wow. I never knew that 👍 . Ok I will continue as planned. When I'm ready to submit, I'll make it very clear if any of the methods are impacted by casting. Regards, |
I am very close to completing this and will submit a PR soon. For now, I need to finish documenting and testing the changes. This comment is WIP - I will update it once finished so please ignore for now. Here is a breakdown of the changes I have made. A note on precision: No method has been implemented such that the precision has been degraded by early casting. It is possible to speed up a number of methods if we allow that to happen, but I will submit that code in a separate commit. However, it is the case that some of the numbers will change ever so slightly as a result of floating point arithmetic; the order in which numbers are being summed has changed in some instances and, in general, (a + b) + c does not equal (a + c) + b when floating point numbers are involved. Cross-ProductSignature: Matrix-MatrixSignature:
Signature:
Signature:
Signature:
Signature:
Matrix-VectorSignature:
Vector-Matrix-VectorSignature:
OuterSignature:
Kronecker (Vectors)Signature:
Kronecker (Matrices)Signature1: |
Hi @cesarsouza Can I please ask what the expected output of the following code should be? var EightBySeven = Matrix.Random(8, 7);
var NineBySeven = Matrix.Random(9, 7);
var EightByOne = Matrix.Zeros(8, 1);
EightBySeven.DotWithTransposed(NineBySeven, EightByOne); // Dimension mismatch ? Currently it works. Is this a bug or a feature? If it's a bug, then there are a few more bugs in the code. If it's a feature, I will go through my changes and make sure I haven't changed this behaviour. I believe this is the only thing left for me to do :) Thanks very much, |
Hi @AlexJCross, This might have been an undocumented feature, at least some time ago :-) But in reality, the method should have been throwing an exception in this case, so please feel free to change it so it really throws a DimensionMismatchException. If this change causes some unit tests in the .Statistics namespace to start failing, please submit your PR nonetheless and I will fix them later when I have more time (unless you would like to give it a try fixing them as well!) Again, many thanks for the excellent work, this PR is going to look amazing! Regards, |
GH-752: Speed up matrix-vector operations
A number of linear algebra operations in the library would benefit from being pinned and working directly with pointers. Creating this issue so it can be referenced in git commits. I will pick this up and send a PR once complete.
Vector-Matrix
This shows a comparison of Accord's
vector.Dot(matrix, result)
versus a proposed change pinning the arrays and using pointer arithmetic.Size is the size of the matrix (so 32 means a
32 x 32
square matrix). Trials is the number of simulations run. The timings are shown next and then the speed multiplier is shown. So the bottom row shows it is 17 times faster to pin the array than Accord's current method.Matrix-Vector
This shows a comparison of Accord's
matrix.Dot(vector, result)
versus a proposed change pinning the arrays and using pointer arithmetic.Thanks,
Alex
The text was updated successfully, but these errors were encountered: