Skip to content
/ TSPM Public

Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.

Notifications You must be signed in to change notification settings

GeWu-Lab/TSPM

Repository files navigation

TSPM

Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.

ArXiv: https://arxiv.org/abs/2407.20693

Authors: Guangyao Li, Henghui Du, Di Hu

Requirements

python3.6 +
pytorch1.6.0
tensorboardX
ffmpeg
numpy

Usage

  1. Clone this repo

    git clone https://github.com/GeWu-Lab/TSPM.git
  2. Download data

    MUSIC-AVQA: https://gewu-lab.github.io/MUSIC-AVQA/

    AVQA: http://mn.cs.tsinghua.edu.cn/avqa/

  3. Feature extraction

    cd feat_script/extract_clip_feat
    python extract_qst_ViT-L14@336px.py
    python extract_qaPrompt_ViT-L14@336px.py
    python extract_token-level_feat.py
    python extract_frames_ViT-L14@336px.py
  4. Training

    python -u main_train.py --Temp_Selection --top_k 10 \
    			--Spatio_Perception \
    			--batch-size 64 --epochs 30 --lr 1e-4 \
    			--num_workers 12 --gpu 0,1 \
    			--checkpoint TSPM \
    			--model_save_dir models
  5. Testing

    python -u main_test.py --Temp_Selection --top_k 10 \
    		       --Spatio_Perception \
    		       --batch-size 1 --gpu 1 \
    		       --checkpoint TSPM \
    		       --model_save_dir models \
    		       --result_dir results

Citation

If you find this work useful, please consider citing it.

coming soon!

Acknowledgement

This research was supported by Public Computing Cloud, Renmin University of China.

About

Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages