ISR-DPO (AAAI'25)

ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO,
Daechul Ahn*¹, Yura Choi*^1,2, San Kim¹, Youngjae Yu², Dongyeop Kang³, Jonghyun Choi^1,†(*Equal Contribution)
¹Seoul National University, ²Yonsei University, ³University of Minnesota
^†Corresponding Author

Abstract: Iterative self-improvement, a concept extending beyond personal growth, has found powerful applications in machine learning, particularly in transforming weak models into strong ones. While recent advances in natural language processing have shown its efficacy through iterative preference optimization, applying this approach to Video Large Multimodal Models (VLMMs) remains challenging due to modality misalignment. VLMMs struggle with this misalignment during iterative preference modeling, as the self-judge model often prioritizes linguistic knowledge over visual information. Additionally, iterative preference optimization can lead to visually hallucinated verbose responses due to length bias within the self-rewarding cycle. To address these issues, we propose Iterative Self-Retrospective Direct Preference Optimization (ISR-DPO), a method that uses self-retrospection to enhance preference modeling. This approach enhances the self-judge's focus on informative video regions, resulting in more visually grounded preferences. In extensive empirical evaluations across dieverse video question answering benchmarks, the ISR-DPO significantly outperforms the state of the art.

Release

[12/10] Our paper is accepted to AAAI 2025!
[07/02] Upload model checkpoint & evaluation code
[06/17] Create repository, update README

Evaluation

Prepare evaluation dataset

using the script from LLaVA-Hound-DPO

TEST_VIDEO_DIR=YOUR_PATH bash setup/setup_test_data.sh

or, download manually from this link

Evaluating the model

# out-domain video question answering
bash Evaluation/pipeline/outdomain_test_pipeline.sh \
    results \
    SNUMPR/isrt_video_llava_7b_9th

Building Preference Data w/ Model

Coming soon

Training

Coming soon

License

GNU GENERAL PUBLIC LICENSE

Acknowledgement

LLaVA-Hound-DPO: Our code is built upon the codebase from LLaVA-Hound-DPO

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Evaluation		Evaluation
assets/images		assets/images
config		config
data_processing		data_processing
inference		inference
llava		llava
setup		setup
trl		trl
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISR-DPO (AAAI'25)

Release

Evaluation

Prepare evaluation dataset

Evaluating the model

Building Preference Data w/ Model

Training

License

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

snumprlab/isr-dpo

Folders and files

Latest commit

History

Repository files navigation

ISR-DPO (AAAI'25)

Release

Evaluation

Prepare evaluation dataset

Evaluating the model

Building Preference Data w/ Model

Training

License

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages