Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train-test shape mismatch #1022

Conversation

maypink
Copy link
Collaborator

@maypink maypink commented Jan 15, 2023

Fixed train test splin using Hold-out validation: now no new targes can appear in test subset.

@maypink maypink linked an issue Jan 15, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jan 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.81%. Comparing base (86d7158) to head (2376f2a).
Report is 150 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1022      +/-   ##
==========================================
- Coverage   87.88%   87.81%   -0.07%     
==========================================
  Files         206      206              
  Lines       13797    13801       +4     
==========================================
- Hits        12126    12120       -6     
- Misses       1671     1681      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maypink
Copy link
Collaborator Author

maypink commented Jan 16, 2023

Получается, есть 3 варианта развития событий при разбиении данных: StratifiedKFold, KFold и использование train_test_split при hold out валидации. С первыми двумя все ок, данные бьются корректно, а с hold out валидацией и правда бывали разбиения, где в тестовой подвыборке появлялись новые метки. Эту проблему решает просто параметр stratify, так что train_data.num_classes >= test_data.num_classes.
А по поводу самой ишуи, это же не проблема, если именно train_data.num_classes != test_data.num_classes, вроде условие с меньше или равно тоже достаточно

@maypink maypink requested a review from gkirgizov January 17, 2023 09:28
@maypink maypink merged commit 03ae732 into master Jan 22, 2023
@maypink maypink deleted the 954-potential-bug-traintest-shape-mismatch-for-multi-class-classification branch January 24, 2023 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential Bug: train/test shape mismatch for multi-class classification
2 participants