Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filters] Create a SamplingFilter base class #3724

Open
kunaltyagi opened this issue Mar 9, 2020 · 15 comments · May be fixed by #3957
Open

[filters] Create a SamplingFilter base class #3724

kunaltyagi opened this issue Mar 9, 2020 · 15 comments · May be fixed by #3957
Labels
changelog: enhancement Meta-information for changelog generation effort: low Rough estimate of time needed to fix/implement/solve kind: todo Type of issue module: filters

Comments

@kunaltyagi
Copy link
Member

Having a seed is something which is common to all sampling classes. I have a rough feeling it would be good to create a SamplingFilter base class. Something for another PR.

Originally posted by @SergioRAgostinho in https://github.com/_render_node/MDE3OlB1bGxSZXF1ZXN0UmV2aWV3MzcwOTgwNjYy/pull_request_reviews/more_threads

This class should use modern C++11/14 <random> header along with modern PRNG and entropy sources provided by stdlib

@kunaltyagi kunaltyagi added changelog: enhancement Meta-information for changelog generation effort: low Rough estimate of time needed to fix/implement/solve good first issue Skills/areas of expertise needed to tackle the issue kind: todo Type of issue module: filters labels Mar 9, 2020
@SergioRAgostinho
Copy link
Member

I don't believe this to be a good first issue. There's some knowledge of the API that needs to be taken into account.

@kunaltyagi kunaltyagi removed the good first issue Skills/areas of expertise needed to tackle the issue label Mar 9, 2020
@SergioRAgostinho
Copy link
Member

SergioRAgostinho commented Mar 10, 2020

Here's my current prototype. Relevant things to mention:

  • Instead of taking a very opinionated approach towards the choice of random engine, it's left open
  • I'm unsure if I should also expose a mechanism to pass seed sequences
  • I believe I will need to expose a constructor which allows setting the seed, given that I've seen a couple classes with the seed supplied as an argument.
  • PCLBase::initCompute will need to become virtual
  • There's a case to be made to initialize the seed with std::random_device in case no seed is supplied.
  • Rename reng_ to rand_eng_ or random_engine_.
  • There's also a case to be made to rename the class to something different. Currently there's no sampling functionality in here whatsoever, just maintenance of the random state.
namespace pcl
{
  template<typename PointT, typename RandomEngineT = std::default_random_engine>
  class SamplingFilter : public FilterIndices<PointT>
  {
    public:

      using SeedT = typename RandomEngineT::result_type;

    protected:

      SeedT seed_ = SeedT::default_seed;

      RandomEngineT reng_;

    public:


      SamplingFilter (bool extract_removed_indices = false)
        : FilterIndices<PointT> (extract_removed_indices)
      {}

      void setSeed (SeedT seed) { seed_ = seed; }

      SeedT getSeed () const { return seed_; }

    protected:

      virtual bool initCompute () override
      {
        reng_.seed (seed_);
        return PCLBase<PointT>::initCompute ();
      }
  }
}

@kunaltyagi
Copy link
Member Author

kunaltyagi commented Mar 10, 2020

I believe I will need to expose a constructor which allows setting the seed, given that I've seen a couple classes with the seed supplied as an argument.

That's a must.

I'm unsure if I should also expose a mechanism to pass seed sequences

Could you please expand on this?


Does it make sense to create a hierarchy? Do you envision other FilterIndices based classes being used with SamplingFilter?

If so, it might make sense to create a Mixin or CRTP since technically, this class brings in an independent and optional capability which can be mixed and matched with other independent capabilities. (Though both would be a relatively new design pattern for PCL). Some detailed description can be found here

PS: I'm overall not fond of classes for filters since it makes it hard to create new with low boiler plate and apply in a row beautifully. Just reporting my bias against full fledged classes for something that might not need classes.

@SergioRAgostinho
Copy link
Member

I'm unsure if I should also expose a mechanism to pass seed sequences

Could you please expand on this?

Check the ctor and seed method invoked with the 3rd and 2nd signature respectively. It's an "issue" of completeness, possibly with not much use in our domain.

Do you envision other FilterIndices based classes being used with SamplingFilter?

Yes. All derived classes from FilterIndices with some random capabilities. If you open the inheritance diagram, you'll see a bunch of types with very suggestive names.

Does it make sense to create a hierarchy?

This is indeed the question... This is a functionality which is applicable to a lot of other non-filter based classes. The only limitation is that it needs a mechanism which to ensures that it is launched before any "true processing" takes place.

While I find CRTP amusing, I don't believe it has room for the current API as it is. The way to go here, might be actually through multiple inheritance. This class is as an independent base class, so diamond inheritance would not be an issue per se.

@kunaltyagi
Copy link
Member Author

Do you envision other FilterIndices based classes being used with SamplingFilter?

I should have worded it like: "Do you envision classes with essentially the same functionality being used with and without SampingFilter in their hierarchy?"

I don't believe it has room for the current API as it is

Templated inheritance only comes into question if the previous question's answer is yes.

It's an "issue" of completeness, possibly with not much use in our domain

Maybe, if it's not a particularly problematic (shouldn't be). Else it's better to leave it and till someone requests it.

@taketwo
Copy link
Member

taketwo commented Mar 29, 2020

If we follow the idea of creating a base class with seed management functionality, then why limit it to filters? There are numerous algorithms in other modules that require random numbers. So taking this to the extreme, PCLBase should manage the seed. On the other hand, there are even more algorithms that are deterministic, and setSeed() in their API will look confusing. This reasoning points me towards the mixin approach. Yet keeping in mind the "prefer composition over inheritance" mantra, I wonder what's the huge benefit of such a mixin besides from giving each deriving class a pair of setSeed()/getSeed() methods automatically?

@kunaltyagi
Copy link
Member Author

I wonder what's the huge benefit of such a mixin

Not having to create the functions, manage the generator and that's about it. Mixin are supposed to be simple classes (like Boost adding in arithmetic and comparison operations via templates) with a small but used often functionality

@SergioRAgostinho
Copy link
Member

SergioRAgostinho commented Mar 30, 2020

Yet keeping in mind the "prefer composition over inheritance" mantra, I wonder what's the huge benefit of such a mixin besides from giving each deriving class a pair of setSeed()/getSeed() methods automatically?

It's refactoring thing. It will allow us to unify the way we handle seed and engine management in all classes that need it, and hopefully, reduce some code duplication.

This being said. I was mentally prepping up to create a base class just for random capabilities and resort to multiple inheritance. Notice how this is already touching two mantras! You guys have been mentioning composition instead on inheritance and at this point I feel I need to ask you to sketch me a proof of concept so that I can better what your optimal strategies to address this are.

I see a case for composition when there are multiple functionality sets that you might want to enable/ disable based on the class, but for our particular case, I'm finding it hard to grasp the benefits. I'm hoping that with your PoC that will be made clear.

@kunaltyagi
Copy link
Member Author

I was mentally prepping up to create a base class just for random capabilities

Orthogonal inheritance feels better than a chain of inheritance. CRTP might help prevent users from making mistakes such as a pointer to the random-capable base class.

@SergioRAgostinho
Copy link
Member

The first example goes along the lines of what I was suggesting but it is extended with more capabilities (FirstN).

CRTP might help prevent users from making mistakes such as a pointer to the random-capable base class.

I'm not sure I fully get this comment, because AFAIK you can still extract a pointer to any of the base classes of a CRTP object. Can you post a snippet illustrating this mistake that the CRTP approach prevents?

@taketwo
Copy link
Member

taketwo commented Mar 31, 2020

I wonder what's the huge benefit of such a mixin

Not having to create the functions, manage the generator and that's about it.

It's just two functions, come on :) And you still need to call mixin constructor explicitly to propagate the seed. Generator management can be done in a helper object and "composed in" into the class.

But yeah, I'm not against mixins actually, just wanted to hear some good arguments.

@kunaltyagi
Copy link
Member Author

Can you post a snippet illustrating this mistake that the CRTP approach prevents?

If we use a simple multiple inheritance, it allows std::vector<FirstN*> which is dangerous because people might try to manage classes that way and the dtor of FirstN is not virtual because it's not supposed to be, and RTTI is needed to properly use dynamic_cast or typeid.

If we use CRTP, the base class becomes FirstN<ActualClass> in which case, it's a bit better since unique_ptr or any other pointer management knows which dtor to call without resorting to RTTI. RTTI style is still possible by using an empty base FirstNBase which

  • requires explicit opt-in since it is empty base
  • has 0 cost, since it is empty base (EBO)

It's just two functions, come on

That's correct, but I'm assuming we'll might see similar refactoring occurring once we get more eyes on more parts of the project (registration looks tempting...). Getting some discussion on a toy example might help to set a direction for later.

"composed in" into the class

Yeah, that's the complete composition approach. This fits in well when there's not any function implemented for a class (eg: this case, the seed is merely read and written). I was providing the other approaches assuming this was the baseline to compare to. For completeness sake, I'll reproduce a sample of code below:

struct FirstN { std::size_t sample_size; };

struct RandomFilter2 {
    RandomFilter2(std::size_t N): first_n_(N) {}
    auto& getSamplingSize() { return first_n_.sample_size; }
    auto getSamplingSize() const { return first_n_.sample_size; }
  protected:
    FirstN first_n_;
};

just wanted to hear some good arguments.

Not a lot. Type-traits + free-standing functions are better for complicated cases since C++ lacks traits. But the implementation also gets complicated.

Mixin and CRTP fill the niche of non-trivial, but not complicated. A sample impl using type traits and free-standing functions is the std::begin(), std::end(). It felt weird using this for the toy examples and I thought this example might be more relatable instead:

template <class T>
struct has_begin_end;  // this is obv not that trivial, needs public members or template specialization
template <class T>
using IsClassicalIterator = std::enable_if_t<has_begin_end<T>::value, bool>;

struct Container;  // normal implementation, use composition or inheritance or whatever

template<typename T, IsClassicalIterator<T> = true>
auto begin(const T& c) { return c.begin(); }

@taketwo
Copy link
Member

taketwo commented Apr 1, 2020

If we use a simple multiple inheritance, it allows std::vector<FirstN*> which is dangerous because people might try to manage classes that way.

This is true, but I don't buy this as an argument in favor of CRTP. People might do a lot of crazy things in C++ and it's not our job to save them. Granted, std::vector<RandomizedAlgorithm*> is not as crazy as, say, int* p = 0; *p = 1;. Nevertheless, there are no legit use-cases for such a vector, so I don't see why we should protect people from this.

It's just two functions, come on

That's correct, but I'm assuming we'll might see similar refactoring occurring

Well, agree.

So from my side:

  • CRTP is out as too complicated;
  • don't see how type traits are relevant for random seed use-case;
  • composition does not give a consistent uniform interface for free;
  • mixin is the weapon of choice!

@SergioRAgostinho
Copy link
Member

Agreed with all 👍

@stale
Copy link

stale bot commented May 19, 2020

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Meta-information for changelog generation effort: low Rough estimate of time needed to fix/implement/solve kind: todo Type of issue module: filters
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants