QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos

Overview

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos
Yogesh Kumar, Saswat Mallick, Anand Mishra, Sowmya Rasipuram, Anutosh Maitra, Roshni Ramnani
AAAI 2024

This repo contains the code for training QDETRv, which implements an efficient approach for one-shot object localization in videos, extending the Query-Guided DETR (QDETR) framework to handle spatiotemporal consistency across video frames. By leveraging query-based attention mechanisms, QDETRv enables robust localization of objects in unseen video sequences based on a single reference image, achieving high accuracy with minimal supervision.

To setup environment

# create new env fsrr
$ conda create -n qdetr python=3.10.4

# activate qdetr
$ conda activate qdetr

# install pytorch, torchvision
$ conda install -c pytorch pytorch torchvision
$ conda install cython scipy

# install other dependencies
$ pip install -r requirements.txt

Pre-training

# download IMGENET and UCF-101 dataset
# To create the synthetic data for pre-training:
$ python ./dataset/syn_trajectory.py path/to/video_folder path/to/output_csv_annotations path/to/save/process_videos

# To pre-train the model
# set config_pre.py

# set CUDA devices
$ export CUDA_VISIBLE_DEVICES=0,1

# Image-level pretraining
$ python train_qdetr_pre.py

# video-level pertaining
$ python train_qdetrv_pre.py

Training

# To download the images
$ python ./dataset/1_download_images.py

# To filter queries into main categories
$ python ./dataset/2_filter_queries.py

# To create query images and target video pairs for training and testing
$ python ./dataset/3_generate_pairs.py

# To train the model

# set config.py
# set CUDA devices
$ export CUDA_VISIBLE_DEVICES=0,1

# training image-level QDETR
$ python train_qdetr.py

# training video-level QDETR
$ python train_qdetrv.py

Evaluation

# set the paths in config
$ python eval.py

Citation

If you find this repo useful, please cite:

@inproceedings{kumar2024qdetrv,
  title={QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos},
  author={Kumar, Yogesh and Mallick, Saswat and Mishra, Anand and Rasipuram, Sowmya and Maitra, Anutosh and Ramnani, Roshni},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={3},
  pages={2831--2839},
  year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
cocoapi		cocoapi
dataset_preparation		dataset_preparation
detr		detr
utils		utils
README.md		README.md
config.py		config.py
config_pre.py		config_pre.py
dataset.py		dataset.py
engine.py		engine.py
engine_pre.py		engine_pre.py
engine_qdetrv.py		engine_qdetrv.py
eval.py		eval.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
train_qdetr.py		train_qdetr.py
train_qdetr_pre.py		train_qdetr_pre.py
train_qdetrv.py		train_qdetrv.py
train_qdetrv_pre.py		train_qdetrv_pre.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos

Overview

To setup environment

Pre-training

Training

Evaluation

Citation

About

Releases

Packages

Contributors 2

Languages

yogesh-iitj/QDETRV

Folders and files

Latest commit

History

Repository files navigation

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos

Overview

To setup environment

Pre-training

Training

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages