reproduce your results #106

zzksdu · 2025-02-17T12:37:50Z

As introduced in your paper, the VITA training process consists of 3 steps.
The first step is to finetune llm module.
The second step is multimodel alignment
The third part is multimodel instruction tuning.

In your code, there are many training scripts. Can you indicate which script corresponds to which step of training?

Another question is, in your source code, the language model uses qwen2, so is the final language model qwen2 or mixtral 8 * 7B?
@wangxiongts @BradyFU @linhaojia13 @longzw1997

zzksdu · 2025-02-18T03:03:37Z

Do you have any plans to make your training dataset public?

linhaojia13 · 2025-02-18T09:11:18Z

VITA-1.0 uses Mixtral as its base language model, while VITA-1.5 uses Qwen2.5-7B-Instruct. Currently, VITA-1.0 is deprecated, so let me explain the training stages for VITA-1.5:

pretrain_mlp_qwen_nodes.sh: Stage 1.1
finetune_qwen_nodes.sh: Stage 1.2
finetuneTask_qwen_nodes.sh: Stage 1.3
finetuneTaskNeg_qwen_nodes.sh: Stage 2.2

As for the datasets used, they are not publicly available, but the majority of them consist of open-source data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduce your results #106

reproduce your results #106

zzksdu commented Feb 17, 2025

zzksdu commented Feb 18, 2025

linhaojia13 commented Feb 18, 2025

reproduce your results #106

reproduce your results #106

Comments

zzksdu commented Feb 17, 2025

zzksdu commented Feb 18, 2025

linhaojia13 commented Feb 18, 2025