-
-
Notifications
You must be signed in to change notification settings - Fork 16.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are images and labels shuffled through the dataloader #761
Comments
Hello @JSGrondin, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
@JSGrondin data is shuffled for training and sorted by aspect ratio for batched rectangular inference during validation. |
Many thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
@JSGrondin in the dataloader: Line 341 in 9d87307
|
@JSGrondin @sramakrishnan247 good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP. train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
workers=workers, image_weights=opt.image_weights, quad=opt.quad,
prefix=colorstr('train: '), shuffle=True) # <--- NEW I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and [email protected], perhaps indicating that the shuffle addition may help delay overtraining. https://wandb.ai/glenn-jocher/VOC To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
I would like to shuffle my validation data so that in clearML I can see 'shuffled' samples. I have tried the following in Is there something I am missing? |
@kevinconka shuffling in validation set is irrelevant for metrics purposes. |
❔Question
When looking at the function create_dataloader in dataset.py, I see that the dataloader doesn't include the argument shuffle=True, which means the data is not shuffled after each epoch. It is not clear to me whether the data is at least shuffled once at the beginning of training when shuffle=False or if the data is simply loaded in the alphanumerical order of the image/label file names? Could anyone clarify this please?
Additional context
The text was updated successfully, but these errors were encountered: