Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use Repeated Stratified K-Fold as base CV class #573

Open
andersbogsnes opened this issue Jun 26, 2020 · 0 comments
Open

ENH: Use Repeated Stratified K-Fold as base CV class #573

andersbogsnes opened this issue Jun 26, 2020 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@andersbogsnes
Copy link
Owner

Is your feature request related to a problem? Please describe.
When doing Stratified K-fold, we can get non-generalizable results based on the random seed. The random seed controls the split of data when doing K-fold, so we can end up with folds that don't accurately represent generalized data.

Describe the solution you'd like
By making RepeatedStratifiedKFold to be the default CV class, we repeat the StratifiedKFold n times, choosing a new random seed for the split each time. This ensures that we control for "unlucky" draws when assessing model generalizability. The downside is potentially longer training times, as we now double CV time for the same folds, given repeat=2

Describe alternatives you've considered
We can also choose to do nothing - the user can pass any CV object they want

Additional context
We try to implement best practice out of the box - in general we favour precision over training time, though there is a balance

@andersbogsnes andersbogsnes added enhancement New feature or request good first issue Good for newcomers labels Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant