Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduce your results #106

Open
zzksdu opened this issue Feb 17, 2025 · 2 comments
Open

reproduce your results #106

zzksdu opened this issue Feb 17, 2025 · 2 comments

Comments

@zzksdu
Copy link

zzksdu commented Feb 17, 2025

As introduced in your paper, the VITA training process consists of 3 steps.
The first step is to finetune llm module.
The second step is multimodel alignment
The third part is multimodel instruction tuning.

In your code, there are many training scripts. Can you indicate which script corresponds to which step of training?

Another question is, in your source code, the language model uses qwen2, so is the final language model qwen2 or mixtral 8 * 7B?
@wangxiongts @BradyFU @linhaojia13 @longzw1997

@zzksdu
Copy link
Author

zzksdu commented Feb 18, 2025

Do you have any plans to make your training dataset public?

@linhaojia13
Copy link
Collaborator

VITA-1.0 uses Mixtral as its base language model, while VITA-1.5 uses Qwen2.5-7B-Instruct. Currently, VITA-1.0 is deprecated, so let me explain the training stages for VITA-1.5:

  • pretrain_mlp_qwen_nodes.sh: Stage 1.1
  • finetune_qwen_nodes.sh: Stage 1.2
  • finetuneTask_qwen_nodes.sh: Stage 1.3
  • finetuneTaskNeg_qwen_nodes.sh: Stage 2.2

As for the datasets used, they are not publicly available, but the majority of them consist of open-source data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants