Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing pcl::MovingLeastSquaresOMP declaration without /openmp #2324

Merged
merged 7 commits into from
Jun 15, 2018
Merged

Missing pcl::MovingLeastSquaresOMP declaration without /openmp #2324

merged 7 commits into from
Jun 15, 2018

Conversation

ThorstenHarter
Copy link
Contributor

The declaration of the pcl::MovingLeastSquaresOMP class is encapsulated with #ifdef _OPENMP.
This will result in compile errors, if we try to build our application without the /openmp flag for certain targets.
The expectation would be that the algorithm falls back to single threaded execution in this case, as the algorithms in the 'feature' module for example do.

I therefore propose to remove the #ifdef _OPENMP around the class declaration and add it around the #pragma omp in the implementation instead.
The instantiations in the cpp have been adapted to use #ifdef PCL_ONLY_CORE_POINT_TYPES, as seen in other classes.

@taketwo
Copy link
Member

taketwo commented May 30, 2018

Is there any good reason to keep these two versions separated? With your changes MovingLeastSquaresOMP when compiled without OpenMP is exactly the same as MovingLeastSquares, right?

@ThorstenHarter
Copy link
Contributor Author

Hi Sergey, that is a valid question, there are however some differences.

The OMP version of the performProcessing-function uses temporary vectors to store the projected points, normals and corresponding input indices, which need then to be merged at the end.
We would have to add additional #ifdef's at these locations, which might be bad for the readability of the code.
On the other hand, having two separate classes is also bad for maintenance.
I have seen it before, that a bug has been fixed in the standard version of an algorithm, but not in the OMP version: #2278
At least both are in the same file for MovingLeastSquares.

I guess this is a design decision.
As far as I understand, PCL users should deliberately choose between the single threaded and the OpenMP version of an algorithm when they write their application code (although you could argue, that they do so by setting the /fopenmp flag at compile time).

From the normal estimation tutorial: http://pointclouds.org/documentation/tutorials/normal_estimation.php

For the speed-savvy users, PCL provides an additional implementation of surface normal estimation which uses multi-core/multi-threaded paradigms using OpenMP to speed the computation. The name of the class is pcl::NormalEstimationOMP, and its API is 100% compatible to the single-threaded pcl::NormalEstimation, which makes it suitable as a drop-in replacement.

Keep also in mind that removing pcl::MovingLeastSquaresOMP would break the build for some people.

@taketwo
Copy link
Member

taketwo commented Jun 4, 2018

Keep also in mind that removing pcl::MovingLeastSquaresOMP would break the build for some people.

Since the API is compatible, we can simply typedef it, should not be an issue.

@SergioRAgostinho What's your opinion? I think I slightly prefer maintainability over readability :)

@SergioRAgostinho
Copy link
Member

@SergioRAgostinho What's your opinion? I think I slightly prefer maintainability over readability :)

👍 Same mindset here, especially for reasons like this.

I have seen it before, that a bug has been fixed in the standard version of an algorithm, but not in the OMP version: #2278

@ThorstenHarter
Copy link
Contributor Author

Okay, I'll check if the two classes could be merged.

Since the API is compatible, we can simply typedef it, should not be an issue.

@taketwo Could you please elaborate how this would look like? Afaik it's not possible to typedef template classes.

@taketwo
Copy link
Member

taketwo commented Jun 5, 2018

You are correct, we'd need C++11 support to do that (which is not yet enabled by default in PCL).

Looking at the code in "mls.h", I think what we can do is just leave MovingLeastSquaresOMP as a deriving class from MovingLeastSquares, but otherwise empty (i.e. no custom performProcessing()). Doing things this way (rather than typedef/using) will allow to add a deprecation warning to this class.

Thorsten Harter added 2 commits June 6, 2018 08:59
Remove #ifdef _OPENMP around the class declaration, add it around #pragma omp instead
Mark MovingLeastSquaresOMP as deprecated
Copy link
Member

@SergioRAgostinho SergioRAgostinho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for our lack of formatting tools.

* \param threads the maximum number of hardware threads to use (0 sets the value to 1)
*/
inline void
setNumberOfThreads(unsigned int threads = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between method and open parenthesis.

/** \brief Constructor for parallelized Moving Least Squares
* \param threads the maximum number of hardware threads to use (0 sets the value to 1)
*/
MovingLeastSquaresOMP(unsigned int threads = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing

std::vector<PointIndices> corresponding_input_indices (threads);
typename PointCloudOut::CloudVectorType projected_points(threads);
typename NormalCloud::CloudVectorType projected_points_normals(threads);
std::vector<PointIndices> corresponding_input_indices(threads);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always have a space before parenthesis

this->computeMLSPointNormal (index, nn_indices, projected_points[tn], projected_points_normals[tn], corresponding_input_indices[tn], this->mls_results_[mls_result_index]);

#ifdef _OPENMP
computeMLSPointNormal(index, nn_indices, projected_points[tn], projected_points_normals[tn], corresponding_input_indices[tn], mls_results_[mls_result_index]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

this->copyMissingFields (input_->points[(*indices_)[cp]], projected_points[tn][pp]);
}
}
copyMissingFields(input_->points[(*indices_)[cp]], projected_points[tn][pp]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

}
copyMissingFields(input_->points[(*indices_)[cp]], projected_points[tn][pp]);
#else
computeMLSPointNormal(index, nn_indices, projected_points, projected_points_normals, *corresponding_input_indices_, mls_results_[mls_result_index]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

computeMLSPointNormal(index, nn_indices, projected_points, projected_points_normals, *corresponding_input_indices_, mls_results_[mls_result_index]);

// Append projected points to output
output.insert(output.end (), projected_points.begin (), projected_points.end ());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

// Append projected points to output
output.insert(output.end (), projected_points.begin (), projected_points.end ());
if (compute_normals_)
normals_->insert(normals_->end (), projected_points_normals.begin (), projected_points_normals.end ());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

@ThorstenHarter
Copy link
Contributor Author

@taketwo I have merged the two implementations of performProcessing() and build and tested them with and without /fopenmp.
The empty class MovingLeastSquaresOMP is kept for backwards compatibility.

@SergioRAgostinho I have added the required spaces (and some more).
My Visual Studio removes them every time you copy and paste, didn't realize that.
You could think about providing a configuration file for a code beautifier like uncrustify for PCL.
It has an option sp_before_sparen, but of course it is some effort to choose all options according to you preferences.

@ThorstenHarter
Copy link
Contributor Author

@SergioRAgostinho I have attached an uncrustify config file which handles the spacing, if you want to have a look at it (had to add .txt for GitHub): Uncrustify_PCL.cfg.txt

Usage: uncrustify --no-backup -c Uncrustify_PCL.cfg [file path]

It might need some tweaking, as I don't know all the rules for indentation etc.
With the UniversalIndentGUI you can see the impact of each parameter in the configuration in real time, that's very handy.
It's also easy to integrate it as external tool in Visual Studio and Eclipse.
This will hopefully save us time for future commits. :-)

@SergioRAgostinho
Copy link
Member

👍 Thanks for sharing. I'll see if I can build up from that.

@taketwo
Copy link
Member

taketwo commented Jun 8, 2018

Thanks Thorsten. I will give it a try with and without OpenMP on my Linux laptop this weekend. We will also need to add a deprecation warning, I'll give you more details on that later.

Regarding formatting, we actually have an uncrustify config, it's linked from our Style Guide page. I don't know how good it is though, never used.

@taketwo taketwo added this to the pcl-1.9.0 milestone Jun 8, 2018
@taketwo
Copy link
Member

taketwo commented Jun 11, 2018

I was looking through the code and this caught my attention:

threads the maximum number of hardware threads to use (0 sets the value to 1)

This did not make much sense to me, so I checked what's going on in the rest of OMP classes in PCL. Turns out, zero has has a different meaning in all other classes (e.g. in FPFHEstimationOMP:

nr_threads the number of hardware threads to use (0 sets the value back to automatic)

I'm not experienced with OpenMP, so I tried to find whether setting num_threads(0) in the parallel block has a special meaning. I did not find anything confirming this. So I wonder what actually happens in all the other classes? Zero is the default value everywhere, so what is the effective default behavior of our OMP classes? Do you have any idea?

@ThorstenHarter
Copy link
Contributor Author

Regarding formatting, we actually have an uncrustify config, it's linked from our Style Guide page.

When I try to access your link or http://dev.pointclouds.org I get a 404 error.
Is the server temporarily offline or has this been moved?

About OpenMP:
Description of the parallel clause: The num_threads clause has the same functionality as the omp_set_num_threads function.
Description of omp_set_num_threads: The value of the parameter num_threads must be a positive integer.

When you call ::omp_set_num_threads(0); and then ::omp_get_num_threads();, it will return 1.
This means that per default all OMP algorithms do single-threaded processing, unless you set a thread count > 1.
So both comments are correct, but the decription in mls.h is more precise in my opinion.
If you want it to be consistent however, I can change it to "back to automatic".

@taketwo
Copy link
Member

taketwo commented Jun 11, 2018

Is the server temporarily offline or has this been moved?

This link is dead indeed, but worked three days ago. Here is the file: Uncrustify_PCL.cfg

OMP algorithms do single-threaded processing, unless you set a thread count > 1. So both comments are correct, but the decription in mls.h is more precise in my opinion.

Thanks for clarification. For me "0 sets the value back to automatic" does not sound correct then. Or at least it is confusing, because I tend to interpret "automatic" value as a value that lets OMP select the number of threads automatically. In other words, with this description I'd expect that setting the number of threads to zero would use as many threads as OMP thinks is appropriate on my machine.

As far as I understand, such automatic thread number selection can actually be achieved by not setting num_threads explicitly. What is your opinion of this approach, does it make sense?

#ifdef _OPENMP
if (threads_) ::omp_set_num_threads (threads_);
pragma omp parallel for schedule (dynamic,1000)
#endif
for (int cp = 0; cp < static_cast<int> (indices_->size ()); ++cp)
  // ...

@ThorstenHarter
Copy link
Contributor Author

Here is the file: Uncrustify_PCL.cfg

This is the config file I created last week ;-)

Number of threads:

If we look only at this loop, we will get the desired result, I tested this and OpenMP decided somehow that 7 would be the magic number.
Since ::omp_set_num_threads() sets a global variable, we may however influence other parts of the code (there are 140 occurrences of #pragma omp parallel), so I would be cautious to change this.

In the case of MovingLeastSquares::performProcessing() the automatic approach won't work, because we have to know the exact number of threads in advance, when creating the vectors for holding the temporary results.

You could also define that 0 translates to number of processors:
const unsigned int threads = threads_ == 0 ? ::omp_get_num_procs () : threads_;
But again, I wouldn't change the default behavior without need, most software developers don't like surprises...

@taketwo
Copy link
Member

taketwo commented Jun 11, 2018

This is the config file I created last week ;-)

Ooops, I got mixed with these configs. In fact, the dev.pointclouds.org server was shut down many years ago, AFAIK it was used to host SVN and bug tracker in pre-GitHub times. I suppose we can assume that the file is lost then.

Thanks for the clarification again, all makes sense. So given your explanations, there is no point in having this special zero value then? Default number of threads should be one.

@ThorstenHarter
Copy link
Contributor Author

So given your explanations, there is no point in having this special zero value then? Default number of threads should be one.

Yes, having a default of 1 would be more obvious.
It might also make sense to change the comments in the other header files to "(0 sets the value to 1)" as in mls.h, to prevent misunderstandings.

Copy link
Member

@taketwo taketwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here are a few things we need to wrap up this pull request.

We definitely need a deprecation warning, however there is a problem that PCL pre-compiles MovingLeastSquaresOMP, which leads to deprecation warnings during PCL build itself. Given that the classes are the same now, I think it should be okay to completely remove instantiation of the OMP version, right?

It might also make sense to change the comments in the other header files

We can do this as a separate PR, let's keep this one focused on MLS.

// (Maximum) number of threads
unsigned int threads = threads_ == 0 ? 1 : threads_;

const unsigned int threads = threads_ == 0 ? 1 : threads_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the added "const"?

I try to "Use const whenever it makes sense.": https://google.github.io/styleguide/cppguide.html#Use_of_const

In this case it's important that the variable does never change, as it defines the size of the vectors for the temporary data.

Copy link
Member

@taketwo taketwo Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, const is alright. I meant why not using threads_ directly since OpenMP does not differentiate between 0 and 1. But after a second look I can answer this question myself, this is needed to ensure vectors are properly sized.

Yet on the other hand, wouldn't it be cleaner if we don't allow threads_ to be zero?

computeMLSPointNormal (index, nn_indices, projected_points, projected_points_normals, *corresponding_input_indices_, mls_results_[mls_result_index]);

// Append projected points to output
output.insert(output.end (), projected_points.begin (), projected_points.end ());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missing space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were more missing spaces.
I have updated the uncrustify config so that the indentation of the access qualifiers matches the PCL style guide and formatted the mls-files in commit 1032788.

Uncrustify_PCL.cfg.txt

@@ -238,7 +238,11 @@ namespace pcl
* Reference paper: "Computing and Rendering Point Set Surfaces" by Marc Alexa, Johannes Behr,
* Daniel Cohen-Or, Shachar Fleishman, David Levin and Claudio T. Silva
* www.sci.utah.edu/~shachar/Publications/crpss.pdf
* \author Zoltan Csaba Marton, Radu B. Rusu, Alexandru E. Ichim, Suat Gedikli
* \note There is a parallelized version of the processing step, using the OpenMP standard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce three separate notes, is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a single \note.

* \param threads the maximum number of hardware threads to use (0 sets the value to 1)
*/
inline void
setNumberOfThreads (unsigned int threads = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say let's have 1 here as well.

Does it make sense to warn user if this function is called with threads > 1 in non-OpenMP build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the default number of threads to 1.

It wouldn't hurt to print a warning, but I don't think it's really necessary.

template <typename PointInT, typename PointOutT>
class MovingLeastSquaresOMP: public MovingLeastSquares<PointInT, PointOutT>
class MovingLeastSquaresOMP : public MovingLeastSquares<PointInT, PointOutT>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a deprecation macro here, something like

class PCL_DEPRECATED_CLASS(MovingLeastSquaresOMP, "MovingLeastSquaresOMP and MovingLeastSquares have been merged, please use the latter") : public MovingLeastSquares<PointInT, PointOutT>

You'll need to include pcl/pcl_macros.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried this macro before, but the problem is that it generates a warning as soon as you include the header file, even if you aren't using the class.
We could move the class declaration to a new header mls_omp.h, but then we cannot include it anywhere.

Which leads me to your next question:

We definitely need a deprecation warning, however there is a problem that PCL pre-compiles MovingLeastSquaresOMP, which leads to deprecation warnings during PCL build itself. Given that the classes are the same now, I think it should be okay to completely remove instantiation of the OMP version, right?

Wouldn't that mean that developer's who use MovingLeastSquaresOMP and link the precompiled PCL libraries will get a linker error?
Maybe it would be better then to remove the class altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that it generates a warning as soon as you include the header file

Which compiler is that? I added this deprecation macro, disabled OMP preinstantiation, and compiled pcl_mls_smoothing target with GCC 5.4, no warnings. Then if I change to using OMP class inside "mls_smoothing.cpp" code, I get a warning as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, GCC behaves as expected and warns only if the class is actually used, but Visual Studio 2015 shows the behavior I described.

Here's the full output:

PCL/surface/include/pcl/surface/mls.h(764): warning C4996: 'pcl::MovingLeastSquaresOMP<PointInT,PointOutT>': MovingLeastSquaresOMP and MovingLeastSquares have been merged, please use the latter
PCL/surface/include/pcl/surface/mls.h(768): note: see reference to class template instantiation 'pcl::MovingLeastSquaresOMP<PointInT,PointOutT>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(54): note: see reference to class template instantiation 'boost::arg<9>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(53): note: see reference to class template instantiation 'boost::arg<8>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(52): note: see reference to class template instantiation 'boost::arg<7>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(51): note: see reference to class template instantiation 'boost::arg<6>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(50): note: see reference to class template instantiation 'boost::arg<5>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(49): note: see reference to class template instantiation 'boost::arg<4>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(48): note: see reference to class template instantiation 'boost::arg<3>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(47): note: see reference to class template instantiation 'boost::arg<2>' being compiled
PCL\3pp\boost\boost/bind/placeholders.hpp(46): note: see reference to class template instantiation 'boost::arg<1>' being compiled

For some reason, the class is still being instantiated, even after removing the macro in mls.cpp and changing my code to use the base class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this pops up even when building the library itself, i.e. libpcl_surface target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does.

/** \brief Constructor for parallelized Moving Least Squares
* \param threads the maximum number of hardware threads to use (0 sets the value to 1)
*/
MovingLeastSquaresOMP (unsigned int threads = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also 1 here?

Thorsten Harter added 3 commits June 13, 2018 09:57
@taketwo
Copy link
Member

taketwo commented Jun 13, 2018

Wouldn't that mean that developer's who use MovingLeastSquaresOMP and link the precompiled PCL libraries will get a linker error?

I think this should work. The declaration of MovingLeastSquaresOMP is in the header file, so it is visible to the user's compiler. The declaration says that the class derives from MovingLeastSquares (thus inheriting all functions without change), does not have any additional member fields, and has one different method (constructor), which is defined in-place. Therefore, the compiler does not need access to the "hpp" file.

In fact, this can be verified by checking the symbols inside the precompiled library. When preinstantiation is enabled, running:

nm -C lib/libpcl_surface.so | grep "MovingLeastSquaresOMP"

will show that only ctor/dtor are instantiated. But as I said, those are defined in the header file, so are available to the compiler anyway.

@ThorstenHarter
Copy link
Contributor Author

I think this should work. The declaration of MovingLeastSquaresOMP is in the header file, so it is visible to the user's compiler. The declaration says that the class derives from MovingLeastSquares (thus inheriting all functions without change), does not have any additional member fields, and has one different method (constructor), which is defined in-place. Therefore, the compiler does not need access to the "hpp" file.

Correct, it's not needed anymore.
I have removed it from mls.cpp.

@taketwo
Copy link
Member

taketwo commented Jun 14, 2018

Is this ready to merge from your side?

@ThorstenHarter
Copy link
Contributor Author

I haven't added the PCL_DEPRECATED_CLASS macro, because I cannot find a solution for the Visual Studio build.

Other than that, it's ready.

@taketwo
Copy link
Member

taketwo commented Jun 14, 2018

OK, I think we are not in a hurry to deprecate it. After the next release we are switching to C++14 (which has standardized deprecation attributes), so we can introduce a deprecation warning at that point. @SergioRAgostinho please comment if you have anything, otherwise I'll merge tomorrow.

@taketwo taketwo added the changelog: API break Meta-information for changelog generation label Jun 15, 2018
@taketwo taketwo merged commit 420f513 into PointCloudLibrary:master Jun 15, 2018
@SergioRAgostinho
Copy link
Member

@taketwo Do you recall why we need the break API here? I see ABI breakage with the inclusion of the threads_attribute into MovingLeastSquares. The rest seems to preserve the same public methods.

@taketwo
Copy link
Member

taketwo commented Sep 27, 2018

At that point we did not have a separate "deprecation" tag, so I think I used "breaks API" to somehow mark pending removal of OMP class. But strictly speaking, we did not even deprecate OMP class yet, so no tags are needed.

Note for future: unless "changes: xxx" is absolutely obvious, add a comment summarizing why the tag is applied.

@taketwo taketwo added changelog: ABI break Meta-information for changelog generation and removed changelog: API break Meta-information for changelog generation labels Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: ABI break Meta-information for changelog generation module: surface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants