Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for group-based cross-validation objects from scikit-learn #181

Closed
PaulZhutovsky opened this issue Mar 1, 2022 · 0 comments · Fixed by #182
Closed

Allow for group-based cross-validation objects from scikit-learn #181

PaulZhutovsky opened this issue Mar 1, 2022 · 0 comments · Fixed by #182
Labels
enhancement New feature or request

Comments

@PaulZhutovsky
Copy link
Contributor

Problem Description
Probatus feature elimination (e.g. ShapRFECV) currently does not allow for cross-validation objects which take groups variables (e.g. StratifiedGroupKFold)

Desired Outcome
It would be great if this feature could be implemented as those groups can be used to prevent data leakage in (e.g.) the case where multiple samples from the same customer are available and therefore should be either only in the training or the test set but not in both.

Solution Outline
The fix to this should be quite simple and can follow the implementation of scikit-learn's RFECV: One would need to add a groups variable (default: None) to the fit/fit_compute methods of ShapRFECV and pass it through to self.cv.split

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant