TSPM

Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.

ArXiv: https://arxiv.org/abs/2407.20693

Requirements

python3.6 +
pytorch1.6.0
tensorboardX
ffmpeg
numpy

Usage

Clone this repo

git clone https://github.com/GeWu-Lab/TSPM.git

Download data

MUSIC-AVQA: https://gewu-lab.github.io/MUSIC-AVQA/

AVQA: http://mn.cs.tsinghua.edu.cn/avqa/

Feature extraction

cd feat_script/extract_clip_feat
python extract_qst_ViT-L14@336px.py
python extract_qaPrompt_ViT-L14@336px.py
python extract_token-level_feat.py
python extract_frames_ViT-L14@336px.py

Training

python -u main_train.py --Temp_Selection --top_k 10 \
			--Spatio_Perception \
			--batch-size 64 --epochs 30 --lr 1e-4 \
			--num_workers 12 --gpu 0,1 \
			--checkpoint TSPM \
			--model_save_dir models

Testing

python -u main_test.py --Temp_Selection --top_k 10 \
		       --Spatio_Perception \
		       --batch-size 1 --gpu 1 \
		       --checkpoint TSPM \
		       --model_save_dir models \
		       --result_dir results

Citation

If you find this work useful, please consider citing it.

coming soon!

Acknowledgement

This research was supported by Public Computing Cloud, Renmin University of China.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TSPM

Requirements

Usage

Citation

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

TSPM

Requirements

Usage

Citation

Acknowledgement