Synthetic data pipeline for multi-image reasoning
This repository contains the official implementation of our paper: Efficient Synthetic Data Pipeline to Improve Multi-Image Reasoning.
- Dataset generation pipeline
We would like to acknowledge the following resources that were instrumental in the development of SMIR:
-
Meta Llama 3.1: We utilized the Llama 3.1 model as our foundational language model via "Together AI".
-
[SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384: We utilized a SigLIP model as our embedding model from Google.
-
CLIP: We utilized MetaCLIP, Meta's implementation of CLIP, as our embedding model.
-
We used training and evaluation code from the following repositories:
@misc{li2025smirefficientsyntheticdata,
title={SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning},
author={Andrew Li and Rahul Thapa and Rahul Chalamala and Qingyang Wu and Kezhen Chen and James Zou},
year={2025},
eprint={2501.03675},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.03675},
}