-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Same order of training samples with NoDuplicatesBatchSampler
#3069
Comments
Hello!
As for the
|
It might be not the most elegant solution, but maybe remaining_indices could be an It probably won't be as fast as with the set (because the |
I was also considering a dict. Because we're at Python 3.7+ now, I think we can just use a normal
from https://docs.python.org/3/whatsnew/3.7.html If the performance hit is not too large, then this is an acceptable solution I think. I'll also look more into fixing the
|
Yes, I agree. A normal dictionary can also be considered. Even though, the behavior might be slightly less predictable. Because the declared "insertion-order preservation" does not necessarily mean the preservation of the order after the deletion of some elements. |
It seems that
NoDuplicatesBatchSampler
produces the same set of batches in the same order regardless of the epoch index.Indeed, in this piece of code, the order of the indices in
remaining_indices
does not depend on the random permutationtorch.randperm(len(self.dataset), generator=self.generator)
as it is reset to the ordered range withset
.Moreover, the seed in line 185 does not change from one epoch to another (the
set_epoch
method does not seem to be used...)The text was updated successfully, but these errors were encountered: