ENH: Use Repeated Stratified K-Fold as base CV class #573

andersbogsnes · 2020-06-26T06:32:01Z

Is your feature request related to a problem? Please describe.
When doing Stratified K-fold, we can get non-generalizable results based on the random seed. The random seed controls the split of data when doing K-fold, so we can end up with folds that don't accurately represent generalized data.

Describe the solution you'd like
By making RepeatedStratifiedKFold to be the default CV class, we repeat the StratifiedKFold n times, choosing a new random seed for the split each time. This ensures that we control for "unlucky" draws when assessing model generalizability. The downside is potentially longer training times, as we now double CV time for the same folds, given repeat=2

Describe alternatives you've considered
We can also choose to do nothing - the user can pass any CV object they want

Additional context
We try to implement best practice out of the box - in general we favour precision over training time, though there is a balance

The text was updated successfully, but these errors were encountered:

andersbogsnes added enhancement New feature or request good first issue Good for newcomers labels Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Use Repeated Stratified K-Fold as base CV class #573

ENH: Use Repeated Stratified K-Fold as base CV class #573

andersbogsnes commented Jun 26, 2020

ENH: Use Repeated Stratified K-Fold as base CV class #573

ENH: Use Repeated Stratified K-Fold as base CV class #573

Comments

andersbogsnes commented Jun 26, 2020