Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are different objects after class dataset and dataloader #624

Closed
ykhsiao25 opened this issue Nov 16, 2019 · 5 comments
Closed

There are different objects after class dataset and dataloader #624

ykhsiao25 opened this issue Nov 16, 2019 · 5 comments
Labels
bug Something isn't working

Comments

@ykhsiao25
Copy link

Hello,
I got some bugs when I train my dataset.
Did you have the same problem about this?

Describe the bug
After putting my images and labels into dataset, there are still 24 objects.
(In train.py dataset = LoadImagesAndLabels(train_path,...) )

But after putting dataset into dataloader, there are just 22 objects
(In train.py dataloader = torch.utils.data.DataLoader(dataset,...))

I have checked "getitem() function ", and there is no problem about this. (Still 24 objects)
I think it's "collate_fn() function" problem because I get 22 objects here,too.

Just want to know it's really a bug(multi_processing?? or others), or I make something wrong.
Thanks a lot in advance.

@ykhsiao25 ykhsiao25 added the bug Something isn't working label Nov 16, 2019
@glenn-jocher
Copy link
Member

@ykhsiao25 using all default settings if I put print the len() of each I get 117263 and 3665. These are the number of images and the number of batches. 117263/32 = 3664.47, so everything is correct (the last batch has less images).

    # Dataset
    dataset = LoadImagesAndLabels(train_path,
                                  img_size,
                                  batch_size,
                                  augment=True,
                                  hyp=hyp,  # augmentation hyperparameters
                                  rect=opt.rect,  # rectangular training
                                  image_weights=opt.img_weights,
                                  cache_labels=True if epochs > 10 else False,
                                  cache_images=False if opt.prebias else opt.cache_images)

    print(len(dataset))

    # Dataloader
    dataloader = torch.utils.data.DataLoader(dataset,
                                             batch_size=batch_size,
                                             num_workers=min([os.cpu_count(), batch_size, 16]),
                                             shuffle=not opt.rect,  # Shuffle=True unless rectangular training is used
                                             pin_memory=True,
                                             collate_fn=dataset.collate_fn)

    print(len(dataloader))

@glenn-jocher
Copy link
Member

@ykhsiao25 can you reproduce your issue on one of the available datasets like coco.data or coco_64img.data?

@glenn-jocher
Copy link
Member

@ykhsiao25 also if you could supply a minimum reproducible example with code this would help.

@ykhsiao25
Copy link
Author

ykhsiao25 commented Nov 17, 2019

@glenn-jocher Thanks for your response!
Not about images but bounding box in these images.

coco.data

classes=6
train=data/train.txt
valid=data/val.txt
names=data/orchid.names  (I rename my .names files)
backup=backup/
eval=coco

And code

#dataset
dataset = LoadImagesAndLabels(train_path,
                              img_size,
                              batch_size,
                              augment=True,
                              hyp=hyp,  # augmentation hyperparameters
                              rect=opt.rect,  # rectangular training #default=False
                              image_weights=opt.img_weights, #default=False
                              cache_images=opt.cache_images)#default=False
#Both of these are the same images(just because the sampler is random) 
print(len(dataset[1][1]))  #24

#dataloader
dataloader = torch.utils.data.DataLoader(dataset,
                                         batch_size=batch_size,
                                         num_workers=opt.num_workers,
                                         shuffle=not opt.rect,  # Shuffle=True unless rectangular training is used
                                         pin_memory=True,  
                                         collate_fn=dataset.collate_fn)
#Both of these are the same images(just because the sampler is random) 
print(len(list(dataloader)[0][1]))     #22 
...
###And by the way
for i, (imgs, targets, paths, _) in pbar: 
      ...   
      print('targets',len(targets))  #24 

@glenn-jocher
Copy link
Member

@ykhsiao25 if augment=True the dataloader will randomly change the input images, so bounding boxes may get cropped or removed altogether during training. train_batch0.jpg shows this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants