ViC-MAE

Official PyTorch/GPU codebase for ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders.

This repo is based on:

MAE
MAE-ST
CAN

Requirements

Create a conda environment and install the requirements:

conda create -y -n vicmae python=3.9 cupy pkg-config compilers libjpeg-turbo libwebp opencv=4.7.0 numba ffmpeg av tmux cudatoolkit=11.8 -c conda-forge
conda activate vicmae
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install wandb ffmpeg-python git+https://github.com/rwightman/pytorch-image-models glances[all]
pip install ffcv

Checkpoints

The following table provides the strongest pre-trained checkpoints used in the paper.

Model	Dataset	Epochs	Batch Size	Download
ViC-MAE-B/16	IN1K + K400	800	4096	Link
ViC-MAE-B/16	IN1K + K400 + K600 + K700 +MiT	800	4096	Link
ViC-MAE-L/16	IN1K + K400	800	4096	Link
ViC-MAE-L/16	IN1K + K400 + K600 + K700 +MiT	800	4096	Link

Training

See PRETRAIN.md for pre-training instructions.

Fine-tuning

See FINETUNE.md for fine-tuning instructions.

Citation

@article{hernandez2023visual,
  title={Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders},
  author={Hernandez, Jefferson and Villegas, Ruben and Ordonez, Vicente},
  journal={arXiv preprint arXiv:2303.12001},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ViC-MAE

Requirements

Checkpoints

Training

Fine-tuning

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

ViC-MAE

Requirements

Checkpoints

Training

Fine-tuning

Citation