Are images and labels shuffled through the dataloader #761

JSGrondin · 2020-08-17T18:15:23Z

❔Question

When looking at the function create_dataloader in dataset.py, I see that the dataloader doesn't include the argument shuffle=True, which means the data is not shuffled after each epoch. It is not clear to me whether the data is at least shuffled once at the beginning of training when shuffle=False or if the data is simply loaded in the alphanumerical order of the image/label file names? Could anyone clarify this please?

Additional context

github-actions · 2020-08-17T18:16:13Z

Hello @JSGrondin, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher · 2020-08-17T18:30:46Z

@JSGrondin data is shuffled for training and sorted by aspect ratio for batched rectangular inference during validation.

JSGrondin · 2020-08-17T18:36:23Z

Many thanks!

github-actions · 2020-09-17T00:38:54Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

sramakrishnan247 · 2021-02-19T20:29:48Z

@glenn-jocher

@JSGrondin data is shuffled for training and sorted by aspect ratio for batched rectangular inference during validation.
Can you please tell me where exactly this is happening?

glenn-jocher · 2021-02-19T20:33:07Z

@JSGrondin in the dataloader:

yolov5/utils/datasets.py

Line 341 in 9d87307

class LoadImagesAndLabels(Dataset): # for training/testing

glenn-jocher · 2021-11-13T12:09:38Z

@JSGrondin @sramakrishnan247 good news 😃! Your original issue may now be fixed ✅ in PR #5623 by @werner-duvaud. This PR turns on shuffling in the YOLOv5 training DataLoader by default, which was missing until now. This works for all training formats: CPU, Single-GPU, Multi-GPU DDP.

train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
                                          hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK,
                                          workers=workers, image_weights=opt.image_weights, quad=opt.quad,
                                          prefix=colorstr('train: '), shuffle=True)  # <--- NEW

I evaluated this PR against master on VOC finetuning for 50 epochs, and the results show a slight improvement in most metrics and losses, particularly in objectness loss and [email protected], perhaps indicating that the shuffle addition may help delay overtraining.

https://wandb.ai/glenn-jocher/VOC

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

kevinconka · 2022-12-16T16:05:02Z

@JSGrondin data is shuffled for training and sorted by aspect ratio for batched rectangular inference during validation.

I would like to shuffle my validation data so that in clearML I can see 'shuffled' samples. I have tried the following in val.py but did not work:

Is there something I am missing?

glenn-jocher · 2022-12-17T10:40:28Z

@kevinconka shuffling in validation set is irrelevant for metrics purposes.

JSGrondin added the question Further information is requested label Aug 17, 2020

github-actions bot added the Stale Stale and schedule for closing soon label Sep 17, 2020

github-actions bot closed this as completed Sep 22, 2020

glenn-jocher linked a pull request Nov 13, 2021 that will close this issue

Default DataLoader shuffle=True for training #5623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are images and labels shuffled through the dataloader #761

Are images and labels shuffled through the dataloader #761

JSGrondin commented Aug 17, 2020

github-actions bot commented Aug 17, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Aug 17, 2020

JSGrondin commented Aug 17, 2020

github-actions bot commented Sep 17, 2020

sramakrishnan247 commented Feb 19, 2021 •

edited

Loading

glenn-jocher commented Feb 19, 2021

glenn-jocher commented Nov 13, 2021 •

edited by UltralyticsAssistant

Loading

kevinconka commented Dec 16, 2022

glenn-jocher commented Dec 17, 2022

Are images and labels shuffled through the dataloader #761

Are images and labels shuffled through the dataloader #761

Comments

JSGrondin commented Aug 17, 2020

❔Question

Additional context

github-actions bot commented Aug 17, 2020 • edited by glenn-jocher Loading

glenn-jocher commented Aug 17, 2020

JSGrondin commented Aug 17, 2020

github-actions bot commented Sep 17, 2020

sramakrishnan247 commented Feb 19, 2021 • edited Loading

glenn-jocher commented Feb 19, 2021

glenn-jocher commented Nov 13, 2021 • edited by UltralyticsAssistant Loading

kevinconka commented Dec 16, 2022

glenn-jocher commented Dec 17, 2022

github-actions bot commented Aug 17, 2020 •

edited by glenn-jocher

Loading

sramakrishnan247 commented Feb 19, 2021 •

edited

Loading

glenn-jocher commented Nov 13, 2021 •

edited by UltralyticsAssistant

Loading