SMiR

Synthetic data pipeline for multi-image reasoning

Overview

This repository contains the official implementation of our paper: Efficient Synthetic Data Pipeline to Improve Multi-Image Reasoning.

Coming Soon

Dataset generation pipeline

🏆 Credits

We would like to acknowledge the following resources that were instrumental in the development of SMIR:

Meta Llama 3.1: We utilized the Llama 3.1 model as our foundational language model via "Together AI".
[SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384: We utilized a SigLIP model as our embedding model from Google.
CLIP: We utilized MetaCLIP, Meta's implementation of CLIP, as our embedding model.
We used training and evaluation code from the following repositories:
- MANTIS: Interleaved Multi-Image Instruction Tuning
- From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

📚 BibTeX

@misc{li2025smirefficientsyntheticdata,
      title={SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning}, 
      author={Andrew Li and Rahul Thapa and Rahul Chalamala and Qingyang Wu and Kezhen Chen and James Zou},
      year={2025},
      eprint={2501.03675},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.03675}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmark-v0.1		benchmark-v0.1
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMiR

Overview

Coming Soon

🏆 Credits

📚 BibTeX

About

Releases

Packages

License

togethercomputer/SMiR

Folders and files

Latest commit

History

Repository files navigation

SMiR

Overview

Coming Soon

🏆 Credits

📚 BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages