This repository contains the implementation of our model AdaArc-LM. This repository is built upon https://github.com/guoyang9/AdaVQA.
Almost all flags can be set at utils/config.py
. The dataset paths, the hyperparams can be set accordingly in this
file.
* One NVIDIA GeForce RTX 2080 Tis
* 4GB approximately
* python==3.7.11
* nltk==3.7
* bcolz==1.2.1
* tqdm==4.62.3
* numpy==1.21.4
* pytorch==1.10.2
* tensorboardX==2.4
* torchvision==0.11.3
* h5py==3.5.0
- Download the VQA-CP datasets from the link provided in the supplementary material.
- The image features can be downloaded by following instructions from : https://github.com/hengyuan-hu/bottom-up-attention-vqa.
- The pre-trained Glove features can be accessed via https://nlp.stanford.edu/projects/glove/.
After downloading the datasets, keep them in the folders set by config.py
The preprocessing steps are as follows:
-
process questions and dump dictionary:
python tools/create_dictionary.py
-
process answers and question types, and generate the frequency-based margins:
python tools/compute_softscore.py
-
convert image features to h5:
python tools/detection_features_converter.py
python main_arcface.py --name test-VQA --gpu 0
python main_arcface.py --name test-VQA --eval-only
Running this code creates a new json file (eg. abc.json), which contains test question ids and the answers predicted by the model.
python acc_per_type.py abc.json
The argument name refers to the name of the file in which the model weights will be finally stored.
Model | Accuracy in % |
---|---|
AdaArc | 57.24 |
+ Randomization | 57.97 |
+Bias-injection | 59.44 |
+Learnable margins | 59.87 |
+Supervised Conctrastive Loss | 60.41 |