Skip to content

Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models"

License

Notifications You must be signed in to change notification settings

sangminwoo/RITUAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RITUAL

🔥 RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

This repository contains the official pytorch implementation of the paper: "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models".

🚨 Updates

  • 2024.12.16: Update Paper / Project page
  • 2024.05.29: Build project page
  • 2024.05.29: RITUAL Paper online
  • 2024.05.28: Code Release

👀 Overview

Overview

TL;DR: RITUAL is a simple yet effective anti-hallucination approach for LVLMs. Our RITUAL method leverages basic image transformations (e.g., vertical and horizontal flips) to enhance LVLM accuracy without external models or training. By integrating transformed and original images, RITUAL significantly reduces hallucinations in both discriminative tasks and descriptive tasks. Using both versions together enables the model to refine predictions, reducing errors and boosting correct responses.

🤖 RITUAL

Overview

When conditioned on the original image, the probabilities for Blue (correct) and Red (hallucinated) responses are similar, which can lead to the hallucinated response being easily sampled. RITUAL leverages an additional probability distribution conditioned on the transformed image, where the likelihood of hallucination is significantly reduced. Consequently, the response is sampled from a linear combination of the two probability distributions, ensuring more accurate and reliable outputs.

RITUAL+

Overview

In RITUAL, the original image V undergoes random transformations, generating a transformed image. In RITUAL+, the model evaluates various potential transformations and selects the most beneficial one to improve answer accuracy within the given context, further refining reliability. These transformed images serve as complementary inputs, enabling the model to incorporate multiple visual perspectives to reduce hallucinations.

💻 Setup

conda create -n RITUAL python=3.10
conda activate RITUAL
git clone https://github.com/sangminwoo/RITUAL.git
cd RITUAL
pip install -r requirements.txt

Models

About model checkpoints preparation

📊 Evaluation

  • POPE: bash eval_bench/scripts/pope_eval.sh
    • Need to specify "model", "model_path"
  • CHAIR: bash eval_bench/scripts/chair_eval.sh
    • Need to specify "model", "model_path", "type"
  • MME: bash experiments/cd_scripts/mme_eval.sh
    • Need to specify "model", "model_path"

About datasets preparation

  • Please download and extract the MSCOCO 2014 dataset from this link to your data path for evaluation.
  • For MME evaluation, see this link.

Results

⚠️ All baseline methods were reimplemented within our evaluation setup for fair comparison.

POPE

POPE results

MME

MME-Fullset

MME-Fullset results

MME-Hallucination

MME-Hallucination results

CHAIR

CHAIR results

Examples

LLaVA-Bench results

LLaVA-Bench results

🙏 Acknowledgments

This codebase borrows from most notably VCD, OPERA, and LLaVA. Many thanks to the authors for generously sharing their codes!

📝 Citation

If you find this repository helpful for your project, please consider citing our work :)

@article{woo2024ritual,
  title={RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models}, 
  author={Woo, Sangmin and Jang, Jaehyuk and Kim, Donguk and Choi, Yubin and Kim, Changick},
  journal={arXiv preprint arXiv:2405.17821},
  year={2024},
}

About

Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages