-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented an in-place method for transforming DataPoints objects #419
Implemented an in-place method for transforming DataPoints objects #419
Conversation
I compared multiple implementations:
In terms of timing, all of them are in sub-millisecond, for all the point clouds I tried (depth camera, LIDAR, SLAM). Transpose seems to be the slowest method, while in place multiply and apply on the left tend to be the fastests, although there doesn't seem to be a single winner there. (NOTE: after testing again I have found that my testing setup made inefficient use of Eigen operations, transpose was actually the fastest when properly implemented) For the final implementation I think I'll go with applyOnTheLeft, because the signature of the function is very idiomatic, and the performance improves. When it comes to the old vs. new methods, the old method is 1.5-10x slower than the new ones, just because of the copy. |
@pomerlef Thank you for the input! Do you have any specific place in mind for this change? After running a quick search I found the following occurrences: They're not so many, but only the ones on ICP.cpp seem related to transformations? |
I didn't check in the code what would be the impact (apparently minor). I just came across this formulation for rigid transformation inverse recently. It was mostly to log it somewhere. |
@pomerlef I extended the unit tests, directly validating the result of transforming DataPoints objects. I also noticed that the similarity transform was not registered, neither had its own constructor, so I added it. I quickly ran a benchmark of the unit tests on Intel VTune, with the HW sampler, obtaining the following results: Before: After: There is a ~10% improvement in running time and a small bump in microarchitecture usage (could be because we do less copies, so less time moving data in memory) |
Looking good! An improvement of 10 % is quite good. Thanks for this PR @YoshuaNava. |
@pomerlef Thank you. Where I saw the benefit of the change the most is when running localization and mapping with parameters for very dense map construction. There the time spent transforming point clouds went down from 7 to around 4%. For live operation, with point clouds and maps downsampled to have 5 to ~10cm density, the overall improvement was from around 5 to 4%. From all the time spent by the transformation functions, 30% is transforming the features https://github.com/YoshuaNava/libpointmatcher/blob/feature/transformations_inplace_compute/pointmatcher/TransformationsImpl.cpp#L191 I tried to come up with further optimizations, but couldn't find something faster than Eigen block operations. For future reference, this is the call stack of the inplacecompute function of rigidtransformation: |
Update: I did find a way to optimize the descriptor assignment further. See snippet here: https://godbolt.org/z/9W547n Another benchmark for the different methods evaluated before introducing this PR: https://godbolt.org/z/r88h6G I'll submit another PR tomorrow. |
I noticed that some computations relying on libpointmatcher were losing time transforming point clouds between sensor and map frames.
I looked into the libpointmatcher Transformations' code, and, following the same pattern as the DataPointsFilter, I implemented an inPlace processing method for computing transformations.
This should reduce the number of copies by two: one for copying the input in the compute() function, and the other one for copying the return value (note it could also be moved).