By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li.
This repo is the official Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting.
- Python >= 3.6
- Pytorch >= 1.0 and corresponding torchvision (https://pytorch.org/)
- Clone this repo:
git clone https://github.com/ruiliu-ai/DSTT.git
- Install other packages:
cd DSTT
pip install -r requirements.txt
Download datasets (YouTube-VOS and DAVIS) into the data folder.
mkdir data
python train.py -c configs/youtube-vos.json
Download pre-trained model into checkpoints folder.
mkdir checkpoints
python test.py -c checkpoints/dstt.pth -v data/DAVIS/JPEGImages/blackswan -m data/DAVIS/Annotations/blackswan
If you find DSTT useful in your research, please consider citing:
@article{Liu_2021_DSTT,
title={Decoupled Spatial-Temporal Transformer for Video Inpainting},
author={Liu, Rui and Deng, Hanming and Huang, Yangyi and Shi, Xiaoyu and Lu, Lewei and Sun, Wenxiu and Wang, Xiaogang and Li Hongsheng},
journal={arXiv preprint arXiv:2104.06637},
year={2021}
}
This code relies heavily on the video inpainting framework from spatial-temporal transformer net.