There are different objects after class dataset and dataloader #624

ykhsiao25 · 2019-11-16T09:32:36Z

Hello,
I got some bugs when I train my dataset.
Did you have the same problem about this?

Describe the bug
After putting my images and labels into dataset, there are still 24 objects.
(In train.py dataset = LoadImagesAndLabels(train_path,...) )

But after putting dataset into dataloader, there are just 22 objects
(In train.py dataloader = torch.utils.data.DataLoader(dataset,...))

I have checked "getitem() function ", and there is no problem about this. (Still 24 objects)
I think it's "collate_fn() function" problem because I get 22 objects here,too.

Just want to know it's really a bug(multi_processing?? or others), or I make something wrong.
Thanks a lot in advance.

glenn-jocher · 2019-11-16T19:22:57Z

@ykhsiao25 using all default settings if I put print the len() of each I get 117263 and 3665. These are the number of images and the number of batches. 117263/32 = 3664.47, so everything is correct (the last batch has less images).

    # Dataset
    dataset = LoadImagesAndLabels(train_path,
                                  img_size,
                                  batch_size,
                                  augment=True,
                                  hyp=hyp,  # augmentation hyperparameters
                                  rect=opt.rect,  # rectangular training
                                  image_weights=opt.img_weights,
                                  cache_labels=True if epochs > 10 else False,
                                  cache_images=False if opt.prebias else opt.cache_images)

    print(len(dataset))

    # Dataloader
    dataloader = torch.utils.data.DataLoader(dataset,
                                             batch_size=batch_size,
                                             num_workers=min([os.cpu_count(), batch_size, 16]),
                                             shuffle=not opt.rect,  # Shuffle=True unless rectangular training is used
                                             pin_memory=True,
                                             collate_fn=dataset.collate_fn)

    print(len(dataloader))

glenn-jocher · 2019-11-16T19:24:58Z

@ykhsiao25 can you reproduce your issue on one of the available datasets like coco.data or coco_64img.data?

glenn-jocher · 2019-11-16T19:25:22Z

@ykhsiao25 also if you could supply a minimum reproducible example with code this would help.

ykhsiao25 · 2019-11-17T22:45:48Z

@glenn-jocher Thanks for your response!
Not about images but bounding box in these images.

coco.data

classes=6
train=data/train.txt
valid=data/val.txt
names=data/orchid.names  (I rename my .names files)
backup=backup/
eval=coco

And code

#dataset
dataset = LoadImagesAndLabels(train_path,
                              img_size,
                              batch_size,
                              augment=True,
                              hyp=hyp,  # augmentation hyperparameters
                              rect=opt.rect,  # rectangular training #default=False
                              image_weights=opt.img_weights, #default=False
                              cache_images=opt.cache_images)#default=False
#Both of these are the same images(just because the sampler is random) 
print(len(dataset[1][1]))  #24

#dataloader
dataloader = torch.utils.data.DataLoader(dataset,
                                         batch_size=batch_size,
                                         num_workers=opt.num_workers,
                                         shuffle=not opt.rect,  # Shuffle=True unless rectangular training is used
                                         pin_memory=True,  
                                         collate_fn=dataset.collate_fn)
#Both of these are the same images(just because the sampler is random) 
print(len(list(dataloader)[0][1]))     #22 
...
###And by the way
for i, (imgs, targets, paths, _) in pbar: 
      ...   
      print('targets',len(targets))  #24

glenn-jocher · 2019-11-17T23:39:21Z

@ykhsiao25 if augment=True the dataloader will randomly change the input images, so bounding boxes may get cropped or removed altogether during training. train_batch0.jpg shows this.

ykhsiao25 added the bug Something isn't working label Nov 16, 2019

glenn-jocher closed this as completed Nov 16, 2019

glenn-jocher reopened this Nov 16, 2019

glenn-jocher closed this as completed Nov 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There are different objects after class dataset and dataloader #624

There are different objects after class dataset and dataloader #624

ykhsiao25 commented Nov 16, 2019

glenn-jocher commented Nov 16, 2019

glenn-jocher commented Nov 16, 2019

glenn-jocher commented Nov 16, 2019

ykhsiao25 commented Nov 17, 2019 •

edited

Loading

glenn-jocher commented Nov 17, 2019

There are different objects after class dataset and dataloader #624

There are different objects after class dataset and dataloader #624

Comments

ykhsiao25 commented Nov 16, 2019

glenn-jocher commented Nov 16, 2019

glenn-jocher commented Nov 16, 2019

glenn-jocher commented Nov 16, 2019

ykhsiao25 commented Nov 17, 2019 • edited Loading

glenn-jocher commented Nov 17, 2019

ykhsiao25 commented Nov 17, 2019 •

edited

Loading