Counting Everyday Objects in Everyday Scenes
Prithvijit Chattopadhyay*, Ramakrishna Vedantam*, Ramprasaath Ramaswamy Selvaraju, Dhruv Batra, Devi Parikh
CVPR 2017
This repository contains code for training one-shot and contextual deep counting models mentioned in the CVPR 2017 version of the paper.
We are interested in counting the number of instances of object classes in natural, everyday images. Previous counting approaches tackle the problem in restricted domains such as counting pedestrians in surveillance videos. Counts can also be estimated from outputs of other vision tasks like object detection. In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes. Our approach is inspired by the phenomenon of subitizing – the ability of humans to make quick assessments of counts given a perceptual signal, for small count values. Given a natural scene, we employ a divide and conquer strategy while incorporating context across the scene to adapt the subitizing idea to counting. Our approach offers consistent improvements over numerous baseline approaches for counting on the PASCAL VOC 2007 and COCO datasets. Subsequently, we study how counting can be used to improve object detection. We then show a proof of concept application of our counting methods to the task of Visual Question Answering, by studying the ‘how many?’ questions in the VQA and COCO-QA datasets.
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
source ~/.bashrc
- Run the script
train.lua
with appropriate arguments to train models. - Import forward pass functions from
fprop_utils.lua
and evaluate predictions by importing functions fromeval.lua
- Extract and store data from your desired CNN and dataset in
data/feature/your_feat_dir
- Store features and ground truth counts on a per-image basis in
data/feature/your_feat_dir/<image-name>.h5
under the keys/data
and/label
respectively
- Based on the descretization; extract and store data from your desired CNN and dataset in
data/feature/your_feat_dir
- Store features and ground truth counts on a per-image basis in
data/feature/your_feat_dir/<image-name>.h5
under the keys/data
and/label
respectively - Unlike glance, each
<image-name>.h5
should contain features and counts for all the cells based on the discretization used (9 features per-image for discretization 3) - The image below shows how the cell features for a 3x3 discretization should be ordered:
The orderings in the above image correspond to the row-indices of the feature and count tensor (with dimensions [9 x feat_dim]
and [9 x num_classes]
respectively) stored under the keys /data
and /label
respectively.
If you find this code useful, consider citing our work:
@InProceedings{Chattopadhyay_2017_CVPR,
author = {Chattopadhyay, Prithvijit and Vedantam, Ramakrishna and Selvaraju, Ramprasaath R. and Batra, Dhruv and Parikh, Devi},
title = {Counting Everyday Objects in Everyday Scenes},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}
BSD