Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShapRFECV adds support for HalvingGridSearchCV #49

Closed
Matgrb opened this issue Dec 7, 2020 · 3 comments · Fixed by #89
Closed

ShapRFECV adds support for HalvingGridSearchCV #49

Matgrb opened this issue Dec 7, 2020 · 3 comments · Fixed by #89
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Matgrb
Copy link
Contributor

Matgrb commented Dec 7, 2020

Problem Description
With the new release of sklearn there is a new HalvingGridSearchCV. We want probatus ShapRFECV to support use of it

Solution Outline
In order not to have to bump the version of sklearn we can check if the clf is instance of BaseSearchCV, which HalvingGridSearchCV extends. In case yes, then it should be treated the same way as RandomizedSearchCV.
One also needs to update the docs.

@Matgrb Matgrb added enhancement New feature or request help wanted labels Dec 7, 2020
@timvink
Copy link
Contributor

timvink commented Dec 7, 2020

HalvingGridSearchCV should be a drop-in replacement for RandomizedGridSearch or GridSearch already, according to the sklearn docs. Good to add some tests to probatus for this, at the very least.

That said, it might be easier to implement the ShapRFECV using the new SequentialFeatureSelector to make the sklearn compabitility sklearn's problem. But then we would probably have to drop some features like step and the hpopt at each step, and change the API.

@Matgrb
Copy link
Contributor Author

Matgrb commented Dec 7, 2020

Currently the HalvingGridSearchCV is not supported unfortunately, this needs to be added.

The SequentialFeatureSelector works in a different way than ShapRFECV and has much higher computational requirement

ShapRFECV for each feature set, you fit a model, find feature importance and remove least important feature. Feature importance is a proxy for how strongly the feature affects the performance of the model.

In SequentialFeatureSelector (backwards) start with the full feature set and check how removing of each feature would affect the performance of the model (using CV). This means that at each iteration, you have to remove each feature
separately, fit a model and evaluate it. This is much more computationally intense.

@Matgrb Matgrb added good first issue Good for newcomers and removed help wanted labels Feb 10, 2021
@Matgrb
Copy link
Contributor Author

Matgrb commented Feb 25, 2021

That issue can be solved by solution proposed in #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants