Skip to content

Latest commit

 

History

History
45 lines (37 loc) · 2.04 KB

README.md

File metadata and controls

45 lines (37 loc) · 2.04 KB

ViC-MAE

Official PyTorch/GPU codebase for ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders.

ViC-MAE

This repo is based on:

Requirements

Create a conda environment and install the requirements:

conda create -y -n vicmae python=3.9 cupy pkg-config compilers libjpeg-turbo libwebp opencv=4.7.0 numba ffmpeg av tmux cudatoolkit=11.8 -c conda-forge
conda activate vicmae
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install wandb ffmpeg-python git+https://github.com/rwightman/pytorch-image-models glances[all]
pip install ffcv

Checkpoints

The following table provides the strongest pre-trained checkpoints used in the paper.

Model Dataset Epochs Batch Size Download
ViC-MAE-B/16 IN1K + K400 800 4096 Link
ViC-MAE-B/16 IN1K + K400 + K600 + K700 +MiT 800 4096 Link
ViC-MAE-L/16 IN1K + K400 800 4096 Link
ViC-MAE-L/16 IN1K + K400 + K600 + K700 +MiT 800 4096 Link

Training

See PRETRAIN.md for pre-training instructions.

Fine-tuning

See FINETUNE.md for fine-tuning instructions.

Citation

@article{hernandez2023visual,
  title={Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders},
  author={Hernandez, Jefferson and Villegas, Ruben and Ordonez, Vicente},
  journal={arXiv preprint arXiv:2303.12001},
  year={2023}
}