You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When doing Stratified K-fold, we can get non-generalizable results based on the random seed. The random seed controls the split of data when doing K-fold, so we can end up with folds that don't accurately represent generalized data.
Describe the solution you'd like
By making RepeatedStratifiedKFold to be the default CV class, we repeat the StratifiedKFold n times, choosing a new random seed for the split each time. This ensures that we control for "unlucky" draws when assessing model generalizability. The downside is potentially longer training times, as we now double CV time for the same folds, given repeat=2
Describe alternatives you've considered
We can also choose to do nothing - the user can pass any CV object they want
Additional context
We try to implement best practice out of the box - in general we favour precision over training time, though there is a balance
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When doing Stratified K-fold, we can get non-generalizable results based on the random seed. The random seed controls the split of data when doing K-fold, so we can end up with folds that don't accurately represent generalized data.
Describe the solution you'd like
By making RepeatedStratifiedKFold to be the default CV class, we repeat the StratifiedKFold n times, choosing a new random seed for the split each time. This ensures that we control for "unlucky" draws when assessing model generalizability. The downside is potentially longer training times, as we now double CV time for the same folds, given
repeat=2
Describe alternatives you've considered
We can also choose to do nothing - the user can pass any CV object they want
Additional context
We try to implement best practice out of the box - in general we favour precision over training time, though there is a balance
The text was updated successfully, but these errors were encountered: