Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

PyTorch implementation of Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Introduction

Cross-modal distillation has been widely used to transfer knowledge across different modalities, enriching the representation of the target unimodal one. Recent studies highly relate the temporal synchronization between vision and sound to the semantic consistency for cross-modal distillation. However, such semantic consistency from the synchronization is hard to guarantee in unconstrained videos, due to the irrelevant modality noise and differentiated semantic correlation.

To mitigate these issues, we first propose a Modality Noise Filter(MNF) module to erase the irrelevant noise in teacher modality with cross-modal context. After this purification, we then design a Contrastive Semantic Calibration (CSC) module to adaptively distill useful knowledge for target modality, by referring to the differentiated sample-wise semantic correlation in a contrastive fashion.

Extensive experiments show that our method could bring a performance boost compared with other distillation methods in both visual action recognition and video retrieval task. We also extend to the audio tagging task to prove the generalization of our method.

Traing & Validation

Use the following commands to test on UCF51 dataset. The checkpoints of our model are in results dir.

train on UCF51

    sh scripts/ucf_train_script.sh

validation on UCF51

    sh scripts/ucf_test_script.sh

get retrieval result on UCF51

    sh retrieval/ucf_retrieval.sh
    python retrieval/mAP_result_ucf.py
    python retrieval/get_retrieval_result_ucf.py

Checkpoints

The dataset and checkpoints could download from here

Bibtex

@article{xia2023robust,
  title={Robust Cross-Modal Knowledge Distillation for Unconstrained Videos},
  author={Xia, Wenke and Li, Xingjian and Deng, Andong and Xiong, Haoyi and Dou, Dejing and Hu, Di},
  journal={arXiv preprint arXiv:2304.07775},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
datasets		datasets
imgs		imgs
loss		loss
models		models
retrieval		retrieval
scripts		scripts
transform		transform
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
args.py		args.py
av_val_utils.py		av_val_utils.py
eval_accuracy.py		eval_accuracy.py
inference.py		inference.py
model_utils.py		model_utils.py
multimodal_main.py		multimodal_main.py
requirements.txt		requirements.txt
train.py		train.py
train_utils.py		train_utils.py
utils.py		utils.py
val_utils.py		val_utils.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Introduction

Traing & Validation

Checkpoints

Bibtex

About

Releases

Packages

Languages

GeWu-Lab/cross-modal-distillation

Folders and files

Latest commit

History

Repository files navigation

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Introduction

Traing & Validation

Checkpoints

Bibtex

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages