You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope I have got this right, but it seems that there are 43 samples duplicated in the training set and 1 sample that is duplicated in the test set. There are also 10 samples in the training set that appear in the test set. This was done by comparing the samples at the byte level.
Here is a list of the duplicates:
Training set duplicates:
[601, 39865]
[831, 24228]
[1826, 23718]
[2024, 53883]
[4974, 6293]
[5520, 49165]
[5790, 11845]
[5822, 33399]
[6139, 37731]
[6280, 41036]
[8485, 31238]
[8841, 28184]
[12571, 56657]
[14096, 32343]
[14710, 22159]
[15587, 28635]
[19308, 20114]
[19668, 21571]
[19760, 39489]
[19888, 24443]
[21072, 32800]
[22852, 28789]
[23052, 57107]
[23413, 33731]
[24785, 46015]
[25297, 40077]
[25629, 49588]
[26314, 49351]
[27045, 40033]
[27421, 31627]
[32113, 38337]
[32300, 33730]
[32303, 56840]
[32888, 41918]
[32922, 54584]
[36634, 39841]
[38261, 41877]
[42756, 53842]
[46667, 57724]
[46782, 54829]
[47929, 54185]
[48480, 59607]
[48955, 51368]
Test set duplicates:
[6334, 8569]
Training set samples overlapping with test set:
Train samples [3763] overlap with test samples [7243]
Train samples [4944] overlap with test samples [7781]
Train samples [6168] overlap with test samples [9227]
Train samples [12404] overlap with test samples [4037]
Train samples [15943] overlap with test samples [6659]
Train samples [22403] overlap with test samples [7762]
Train samples [34617] overlap with test samples [4990]
Train samples [35772] overlap with test samples [7216]
Train samples [48228] overlap with test samples [5867]
Train samples [52205] overlap with test samples [9560]
The code required to generate the above output is as follows (assuming the input images are in the variables train_X and test_X:
I hope I have got this right, but it seems that there are 43 samples duplicated in the training set and 1 sample that is duplicated in the test set. There are also 10 samples in the training set that appear in the test set. This was done by comparing the samples at the byte level.
Here is a list of the duplicates:
The code required to generate the above output is as follows (assuming the input images are in the variables
train_X
andtest_X
:The text was updated successfully, but these errors were encountered: