Pytorch implementation of Interactive Fusion of Multi-level Features for Compositional Activity Recognition.
Our approach is tested on only Ubuntu with GPU and it needs at least 16G GPU memory. The neccseearay packages can be installed by the following commonds:
conda create -n Interactive_Fusion python=3.6
conda activate Interactive_Fusion
pip install pyyaml matplotlib tensorboardx opencv-python
pip install torch torchvision
- Download Something-Something Dataset and Something-Else Annotations.
- Extract (or softlink) videos under
dataset/sth_else/videos
, and then dump the frames intodataset/sth_else/frames
by the following command:
bash tools/dump_frames_sth.sh
- Download Charades Dataset (scaled to 480p) and Action Genome Annotations.
- Extract (or softlink) videos under
dataset/charades/videos
, put the annotations intodataset/charades/annotations
, and then dump the frames intodataset/charades/frames
by the following command:
bash tools/dump_frames_char.sh
Get some video information, such as the height and width of the video, and the number of frames in each video. Alternatively, you can also download video_info.json
from here.
bash tools/get_video_info.sh 'sth_else'
bash tools/get_video_info.sh 'charades'
You can check some necessary files in Baidu Cloud , such as the annotations and video_info.json into dataset
and the data-split settings in dataset_splits
. Download and put them in the same path.
# Compositional setting for Something-Else
python main.py --cfg STHELSE/COM/GT/OURS
# Fewshot setting for Something-Else
python main.py --cfg STHELSE/FEWSHOT/GT/OURS/base
python main.py --cfg STHELSE/FEWSHOT/GT/OURS/5shot
python main.py --cfg STHELSE/FEWSHOT/GT/OURS/10shot
If you wish to refer to the results of this work, please use the following BibTeX entry.
<!-- @article{yan2020interactive,
title={Interactive Fusion of Multi-level Features for Compositional Activity Recognition},
author={Yan, Rui and Xie, Lingxi and Shu, Xiangbo and Tang, Jinhui},
journal={arXiv preprint arXiv:2012.05689},
year={2020}
} -->
@article{yan2023progressive,
title={Progressive instance-aware feature learning for compositional action recognition},
author={Yan, Rui and Xie, Lingxi and Shu, Xiangbo and Zhang, Liyan and Tang, Jinhui},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={45},
number={8},
pages={10317--10330},
year={2023},
publisher={IEEE}
}
Our code is built on the Pytorch implementation of STIN proposed by joaanna.
Feel free to create a pull request or contact me by Email = ["ruiyan", at, "njust", dot, "edu", dot, "cn"], if you find any bugs.