Implemented an in-place method for transforming DataPoints objects #419

YoshuaNava · 2020-11-19T18:39:51Z

I noticed that some computations relying on libpointmatcher were losing time transforming point clouds between sensor and map frames.

I looked into the libpointmatcher Transformations' code, and, following the same pattern as the DataPointsFilter, I implemented an inPlace processing method for computing transformations.

This should reduce the number of copies by two: one for copying the input in the compute() function, and the other one for copying the return value (note it could also be moved).

YoshuaNava · 2020-11-19T19:06:57Z

I compared multiple implementations:

In place multiply:

cloud.features = parameters * cloud.features;

Apply on the left:

cloud.applyOnTheLeft(parameters);

Transpose multiply:

cloud.features.transpose() *= parameters.transpose();

In terms of timing, all of them are in sub-millisecond, for all the point clouds I tried (depth camera, LIDAR, SLAM).

Transpose seems to be the slowest method, while in place multiply and apply on the left tend to be the fastests, although there doesn't seem to be a single winner there. (NOTE: after testing again I have found that my testing setup made inefficient use of Eigen operations, transpose was actually the fastest when properly implemented)

For the final implementation I think I'll go with applyOnTheLeft, because the signature of the function is very idiomatic, and the performance improves.

When it comes to the old vs. new methods, the old method is 1.5-10x slower than the new ones, just because of the copy.

pomerlef · 2020-11-19T21:03:17Z

Since you are digging into this. You can optimize the inverse of a transformation matrix using:

In Python, skipping the inverse lead to some improvements:

With inverse
10.8 µs ± 42.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Without inverse
9.42 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Improvement of 12.27%

YoshuaNava · 2020-11-20T06:00:39Z

@pomerlef Thank you for the input! Do you have any specific place in mind for this change?

After running a quick search I found the following occurrences:

They're not so many, but only the ones on ICP.cpp seem related to transformations?

pomerlef · 2020-11-20T13:59:40Z

I didn't check in the code what would be the impact (apparently minor). I just came across this formulation for rigid transformation inverse recently. It was mostly to log it somewhere.

pointmatcher/TransformationsImpl.h

… in place functions to work

YoshuaNava · 2020-12-01T18:04:28Z

@pomerlef I extended the unit tests, directly validating the result of transforming DataPoints objects. I also noticed that the similarity transform was not registered, neither had its own constructor, so I added it.

I quickly ran a benchmark of the unit tests on Intel VTune, with the HW sampler, obtaining the following results:

Before:

After:

There is a ~10% improvement in running time and a small bump in microarchitecture usage (could be because we do less copies, so less time moving data in memory)

utest/utest.h

pomerlef · 2020-12-01T19:24:04Z

Looking good! An improvement of 10 % is quite good. Thanks for this PR @YoshuaNava.

YoshuaNava · 2020-12-01T19:42:51Z

@pomerlef Thank you.

Where I saw the benefit of the change the most is when running localization and mapping with parameters for very dense map construction. There the time spent transforming point clouds went down from 7 to around 4%.

For live operation, with point clouds and maps downsampled to have 5 to ~10cm density, the overall improvement was from around 5 to 4%.

From all the time spent by the transformation functions, 30% is transforming the features https://github.com/YoshuaNava/libpointmatcher/blob/feature/transformations_inplace_compute/pointmatcher/TransformationsImpl.cpp#L191
and the rest is transforming the descriptors: https://github.com/YoshuaNava/libpointmatcher/blob/feature/transformations_inplace_compute/pointmatcher/TransformationsImpl.cpp#L202

I tried to come up with further optimizations, but couldn't find something faster than Eigen block operations.

For future reference, this is the call stack of the inplacecompute function of rigidtransformation:

YoshuaNava · 2020-12-02T00:18:03Z

Update: I did find a way to optimize the descriptor assignment further. See snippet here: https://godbolt.org/z/9W547n

Another benchmark for the different methods evaluated before introducing this PR: https://godbolt.org/z/r88h6G

I'll submit another PR tomorrow.

YoshuaNava added 2 commits November 19, 2020 19:35

Implemented an in-place method for transforming DataPoints objects

87c7dc4

Fix typo

2f6065a

Optimized descriptor rotations

7c35308

YoshuaNava commented Dec 1, 2020

View reviewed changes

pointmatcher/TransformationsImpl.h Show resolved Hide resolved

pomerlef and others added 2 commits December 1, 2020 11:37

Merge branch 'master' into feature/transformations_inplace_compute

a85d15e

Implemented unit tests, added sim transformation to registry, set new…

3e65d06

… in place functions to work

pomerlef reviewed Dec 1, 2020

View reviewed changes

utest/utest.h Show resolved Hide resolved

pomerlef merged commit fba9de4 into norlab-ulaval:master Dec 1, 2020

YoshuaNava mentioned this pull request Dec 2, 2020

Optimized transformations by transposing and then multiplying in place #424

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented an in-place method for transforming DataPoints objects #419

Implemented an in-place method for transforming DataPoints objects #419

YoshuaNava commented Nov 19, 2020 •

edited

Loading

YoshuaNava commented Nov 19, 2020 •

edited

Loading

pomerlef commented Nov 19, 2020

YoshuaNava commented Nov 20, 2020 •

edited

Loading

pomerlef commented Nov 20, 2020

YoshuaNava commented Dec 1, 2020

pomerlef commented Dec 1, 2020

YoshuaNava commented Dec 1, 2020 •

edited

Loading

YoshuaNava commented Dec 2, 2020 •

edited

Loading

Implemented an in-place method for transforming DataPoints objects #419

Implemented an in-place method for transforming DataPoints objects #419

Conversation

YoshuaNava commented Nov 19, 2020 • edited Loading

YoshuaNava commented Nov 19, 2020 • edited Loading

pomerlef commented Nov 19, 2020

YoshuaNava commented Nov 20, 2020 • edited Loading

pomerlef commented Nov 20, 2020

YoshuaNava commented Dec 1, 2020

pomerlef commented Dec 1, 2020

YoshuaNava commented Dec 1, 2020 • edited Loading

YoshuaNava commented Dec 2, 2020 • edited Loading

YoshuaNava commented Nov 19, 2020 •

edited

Loading

YoshuaNava commented Nov 19, 2020 •

edited

Loading

YoshuaNava commented Nov 20, 2020 •

edited

Loading

YoshuaNava commented Dec 1, 2020 •

edited

Loading

YoshuaNava commented Dec 2, 2020 •

edited

Loading