Pytroch implementation of our paper "Contrastive Learning of Video Representations with Temporally Adversarial Examples", a journal extension of our preliminary work presented in CVPR 2021. Extensive additional ananlysis are presented in this version.
The Pytorch implementation of our previous CVPR 2021 work is available at: https://github.com/tinapan-pt/VideoMoCo.
Framework of the proposed approach.
We introduce generative adversarial learning to improve the temporal robustness of the encoder. We use a generator to temporally drop out several frames from this sample. The discriminator is then learned to encode similar feature representations regardless of frame removals. By adaptively dropping out different frames during training iterations of adversarial learning, we augment this input sample to train a temporally robust encoder. Second, we propose a temporally adversarial decay to model key attenuation in the memory queue when computing the contrastive loss.
- pytroch >= 1.3.0
- tensorboard
- cv2
- kornia
K400 dataset
- Download the K400 dataset from the official website.
python train.py \
--log_dir ./logs_moco \
--ckp_dir ./checkpoints_moco \
-a r2plusd_18 \
--lr 0.04 \
-fpc 32 \
-b 256 \
-j 128 \
--epochs 200 \
--schedule 120 160 \
--dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
DATA_DIR/kinetics-400
Downstream task evaluation
- Action Recognition
- Video Retrieval
- Feature Separation