Haoxuan You*, Rui Sun*, Zhecan Wang*, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang
[*: equal contribution]
Clone our repository and create a new python environment via the following command
git clone https://github.com/Hxyou/IdealGPT.git
cd IdealGPT
conda env create -f environment.yml
conda activate idealgpt
If you would like to use LLaVA and MiniGPT4 to solve sub-questions, please install them as mentioned in their repository.
In our paper, we conduct experiments on SNLI-VE and VCR. Please refer to their website to see how to download the data.
NOTE: 1. If you would like to run our code, please replace the filepath with yours. 2. You need to configure an OpenAI key to use OpenAI API. More details can be found at OpenAI platform
In order to save money and running time, you can randomly select 500 samples from the val/dev split of VCR and SNLI-VE at first. (dataset can be vcr_val
or ve_dev
)
cd misc
python sample_data.py --dataset=vcr_val
cd ..
Then, you can use IdealGPT to do inference. Here is an example of zero-shot VCR.
python blip_gpt_main.py \
--data_root=/your/dataset/path \
--exp_tag=vcr_05241522 \
--dataset=vcr_val \
--device_id=0 \
--prompt_setting=v1a \
--data_partition=0_499 \
--vqa_model=blip2_t5_xl \
--temp_gpt=0.0 \
--data_subset=/your/selected/data/subset/path \
--openai_key=<your_openai_key>
You can replace vcr_val
with ve_dev
to obtain SNLI-VE results.
We employ accuracy to evaluate zero-shot performance on VCR and SNLI-VE.
- VCR
python vcr_eval.py --result=[path of saved VCR result folder, named by exp_tag in run]
- SNLI-VE
python ve_eval.py --result=[path of saved VCR result folder, named by exp_tag in run]
If you are interested in our work, please cite the paper as
@misc{you2023idealgpt,
title={IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models},
author={Haoxuan You and Rui Sun and Zhecan Wang and Long Chen and Gengyu Wang and Hammad A. Ayyubi and Kai-Wei Chang and Shih-Fu Chang},
year={2023},
eprint={2305.14985},
archivePrefix={arXiv},
primaryClass={cs.CV}
}