Skip to content

ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO

Notifications You must be signed in to change notification settings

snumprlab/isr-dpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ISR-DPO (AAAI'25)

ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO,
Daechul Ahn*1, Yura Choi*1,2, San Kim1, Youngjae Yu2, Dongyeop Kang3, Jonghyun Choi1,†(*Equal Contribution)
1Seoul National University, 2Yonsei University, 3University of Minnesota
Corresponding Author

srt-model isrt-paper

Abstract: Iterative self-improvement, a concept extending beyond personal growth, has found powerful applications in machine learning, particularly in transforming weak models into strong ones. While recent advances in natural language processing have shown its efficacy through iterative preference optimization, applying this approach to Video Large Multimodal Models (VLMMs) remains challenging due to modality misalignment. VLMMs struggle with this misalignment during iterative preference modeling, as the self-judge model often prioritizes linguistic knowledge over visual information. Additionally, iterative preference optimization can lead to visually hallucinated verbose responses due to length bias within the self-rewarding cycle. To address these issues, we propose Iterative Self-Retrospective Direct Preference Optimization (ISR-DPO), a method that uses self-retrospection to enhance preference modeling. This approach enhances the self-judge's focus on informative video regions, resulting in more visually grounded preferences. In extensive empirical evaluations across dieverse video question answering benchmarks, the ISR-DPO significantly outperforms the state of the art.

Overview

Release

  • [12/10] Our paper is accepted to AAAI 2025!
  • [07/02] Upload model checkpoint & evaluation code
  • [06/17] Create repository, update README

Evaluation

Prepare evaluation dataset

  • using the script from LLaVA-Hound-DPO
    TEST_VIDEO_DIR=YOUR_PATH bash setup/setup_test_data.sh
    
  • or, download manually from this link

Evaluating the model

# out-domain video question answering
bash Evaluation/pipeline/outdomain_test_pipeline.sh \
    results \
    SNUMPR/isrt_video_llava_7b_9th

Building Preference Data w/ Model

  • Coming soon

Training

  • Coming soon

License

GNU GENERAL PUBLIC LICENSE

Acknowledgement

  • LLaVA-Hound-DPO: Our code is built upon the codebase from LLaVA-Hound-DPO

About

ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages