Skip to content

JiuhaiChen/Florence-VL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion.

[Paper] [Project Page] [Demo 8B] [Checkpoint 8B]

Florence-VL Results

News

  • [2024-12-05] We release Arxiv paper, training code, checkpoint and Demo [3B, 8B]. 🤗 Have fun!

Install Environment

  1. Install package for tranining
conda create -n florence-vl python=3.11 -y
conda activate florence-vl
pip install --upgrade pip  
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
  1. Install package for evaluation (We use lmms-eval for evaluation.)
cd lmms-eval
pip install -e .

Dataset Download

  1. Pretrain Data:

    Detailed Caption from PixelProse and ShareGPT4V.

  2. Instruction Data:

    TODO.

Training Script

Training with llama 3.1-8B (phi-3 is similar)

Set up your basic slurm information in the scripts/florence-vl/llama/llama3.sh Then you can run pretrain and finetune job:

In scripts/florence-vl/llama/pretrain_llama.sh, you need to manully export the following variable:

export NNODES=number of nodes

export DATA_PATH=/your/path/for/pretrain/data/json/file
export IMG=/your/image/folder

export OUTPUT=/checkpoint/save/path

In scripts/florence-vl/llama/finetune_llama.sh, you need to manully export the following variable:

export NNODES=number of nodes

export DATA_PATH=/your/path/for/instuction/data/json/file
export IMG=/your/image/folder

export CKPT_PATH=/pretrain/checkpoint
export VIT_PATH=/pretrain/checkpoint/vision_tower
export OUTPUT=/checkpoint/save/path

Evaluation Script

We use lmms-eval for evaluation.

export OPENAI_API_KEY=your key
python -m accelerate.commands.launch \
    --num_processes=4 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="/your/model/path/,conv_template=/choose/from/llama3/or/phi" \
    --tasks  textvqa_val,gqa,realworldqa,vizwiz_vqa_val,pope,scienceqa_img,mmvet,mme,seedbench,hallusion_bench_image,llava_in_the_wild,mathvista_testmini,docvqa_val,ocrbench,chartqa,ai2d,mmmu_val,mmbench_en_dev,infovqa_val,mmbench_cn_dev,mmstar \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix florence-vl \
    --output_path ./logs/

Checkpoint

  1. Florence-VL 8B: Pretrained Checkpoint and Instructed Checkpoint.
  2. Florence-VL 3B: Pretrained Checkpoint and Instructed Checkpoint.

Acknowledgement

LLaVA: We start from codebase from the amazing LLaVA.

lmms-eval: Thanks for amazing multimodal evaluation codebase.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published