You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As introduced in your paper, the VITA training process consists of 3 steps.
The first step is to finetune llm module.
The second step is multimodel alignment
The third part is multimodel instruction tuning.
In your code, there are many training scripts. Can you indicate which script corresponds to which step of training?
Another question is, in your source code, the language model uses qwen2, so is the final language model qwen2 or mixtral 8 * 7B? @wangxiongts@BradyFU@linhaojia13@longzw1997
The text was updated successfully, but these errors were encountered:
VITA-1.0 uses Mixtral as its base language model, while VITA-1.5 uses Qwen2.5-7B-Instruct. Currently, VITA-1.0 is deprecated, so let me explain the training stages for VITA-1.5:
pretrain_mlp_qwen_nodes.sh: Stage 1.1
finetune_qwen_nodes.sh: Stage 1.2
finetuneTask_qwen_nodes.sh: Stage 1.3
finetuneTaskNeg_qwen_nodes.sh: Stage 2.2
As for the datasets used, they are not publicly available, but the majority of them consist of open-source data.
As introduced in your paper, the VITA training process consists of 3 steps.
The first step is to finetune llm module.
The second step is multimodel alignment
The third part is multimodel instruction tuning.
In your code, there are many training scripts. Can you indicate which script corresponds to which step of training?
Another question is, in your source code, the language model uses qwen2, so is the final language model qwen2 or mixtral 8 * 7B?
@wangxiongts @BradyFU @linhaojia13 @longzw1997
The text was updated successfully, but these errors were encountered: