Skip to content

This is a competition project which is a part of the fine-grained visual-categorization workshop (FGVC6 workshop) at CVPR 2019

Notifications You must be signed in to change notification settings

omcaaaar/iFood_2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iFood_2019

This is a part of the fine-grained visual-categorization workshop (FGVC6 workshop) at CVPR 2019.

Description:

alt text What did you eat today? Wondering if you are eating a healthy diet? Automatic food identification can assist towards food intake monitoring to maintain a healthy diet. Food classification is a challenging problem due to the large number of food categories, high visual similarity between different food categories, as well as the lack of datasets that are large enough for training deep models. In this competition, we extend our last year's dataset to 251 fine-grained (prepared) food categories with 118,475 training images collected from the web. We provide human verified labels for both the validation set of 11,994 images and the test set of 28,377 images. The goal is to build a model to predict the fine-grained food-category label given an image.

The main challenges are:

  1. Fine-grained Classes: The classes are fine-grained and visually similar. For example, the dataset has 15 different types of cakes, and 10 different types of pastas.
  2. Noisy Data: Since the training images are crawled from the web, they often include images of raw ingredients or processed and packaged food items. This is referred to as cross-domain noise. Further, due to the fine-grained nature of food-categories, a training image may either be incorrectly labeled into a visually similar class or be annotated with with a single label despite having multiple food items.

Evaluation:

For each image , an algorithm will produce 3 labels , . For this competition each image has one ground truth label , and the error for that image is:

Where

The overall error score for an algorithm is the average error over all test images:

Submisstion file format:

image_name,label1 label2 label3 
test_0001.jpg,0 1 10 
test_0002.jpg,1 3 5 
test_0003.jpg,0 5 1 

Please include the header as shown above for correct parsing. Each line will correspond to one test image and will be identified by the id (e.g test_0001.jpg refers to image test_0001.jpg) for computing accuracy.

Data:

There is a total of 251 food categories in the dataset. A complete list of classes is available here.

Training data:

The training data consists of 118,475 images from 251 classes. The training data is collected from web images and consists of noisy labels.

Validation data:

The validation data consists of 11,994 images from 251 classes. The test data is collected from web images and the labels are human verified. It does not contain noisy labels.

Test data:

The test data consists of 28,377 images from 251 classes. The test data is collected from web images and the labels are human verified. It does not contain noisy labels.

Data download and format:

Data can be downloaded from the links below or from Kaggle.

Annotations (3.0 MB)

  • Running md5sum annot.tar on the tar file should produce 0c632c543ceed0e70f0eb2db58eda3ab
  • The tar contains 4 files
    • class_list.txt: Contains the names of 251 class labels. This can be used to map class_ids with class names.
    • train_info.csv: Each line of this csv containing the "image_name,label" pair for training data. For example, "train_00000.jpg,94" refers to image train_00000.jpg having class_id 94. The class_id can be mapped to class name using class_list.txt.
    • val_info.csv: Each line of this csv containing the "image_name,label" pair for validation data.
    • test_info.csv: csv only provides the list of test images.
  • We provide separate tars for train, val and test images as mentioned below.

Train Images (2.3 GB)

  • Running md5sum train.tar on the tar file should produce 8e56440e365ee852dcb0953a9307e27f
  • Contains training images.
  • For label information see annotation file train_info.csv.

Validation Images (231 MB)

  • Running md5sum val.tar on the tar file should produce fa9a4c1eb929835a0fe68734f4868d3b
  • Contains validation images.
  • For label information see annotation file val_info.csv.

Test Images (548 MB)

  • Running md5sum train.tar on the tar file should produce 32479146dd081d38895e46bb93fed58f
  • Contains testing images.
  • The label will be evaluation on the evaluation server.

Annotations:

This folder contains some important files which we'll be using while training our models.

  1. class_balance.csv : Used to analyse class imbalance in the training data.

  2. outliers.txt : Contains list of all noisy/misclassified images in the training data.

  3. train_info_v2.csv, val_info_v2,csv : Used to make data folders so that we can load the data using PyTorch's DataLoader before training starts.

Notebooks:

This folder contains all the notebook files we've used during this competition.

We trained 4 networks seperately and ensembled them at the end. The typical training flow was to fine-tune a network which is pretrained on ImageNet for 15 epochs and then train a full network for 3-5 epochs. We used BCEWithLogitsLoss for this problem and optimizer was Adam with initial learning rate of 1e-4 and reducing it by the factor of 10 after certain steps using MultiStepLR scheduler, beta values were 0.9 and 0.999.

The networks are : pnasnet, senet154, polynet and densenet201. The highest scoring model was polynet followed by senet154, pnasnet and densenet201 at last.

We also tried cleaning dirty labels and then augmenting the data externally but unfortunately that didn't give promising results.

The trained model files can be found here

Results:

1. polynet : 86.26% (top-3 accuracy)
2. senet154 : 85.76% (top-3 accuracy)
3. pnasnet : 84.77% (top-3 accuracy)
4. densenet201 : 81.20 (top-3 accuracy)

After ensembling these 4 networks we got 90.54% top-3 accuracy on the test data.

Scope of improvement:

Due to time constraint we could not try following techniques, but would have certainly helped us improving the accuracy by atleast 2-3%.

  1. Training and testing on different scales
  2. mixup
  3. label smoothing
  4. DropBlock
  5. we also could have created food pretrained training data by selecting only food-related data in openImage, ImageNet Fall 2011 (inspired by the following paper: Domain Adaptive Transfer Learning with Specialist Models)

References:

  1. Competition link : kaggle
  2. Competition link : github
  3. models : pretrained-models

About

This is a competition project which is a part of the fine-grained visual-categorization workshop (FGVC6 workshop) at CVPR 2019

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published