VITA1.5微调loss为0 #103

Vincentwei1021 · 2025-02-08T07:33:05Z

感谢开源！目前想在自己的数据集上微调vita1.5的audio adapter和qwen llm部分，但是遇到loss为0的情况，想问问有没有遇到过类似的问题，或者我的setup是否哪里出错？

以下是详细信息：
依照官方continue training流程，使用的脚本为[finetuneTaskNeg_qwen.sh]:

    --mm_projector_type mlp2x_gelu \
    --freeze_audio_encoder True \
    --freeze_audio_encoder_adapter False \
    --image_aspect_ratio square \
    --group_by_modality_length False \
    --bf16 True \
    --output_dir ${OUTPUT_DIR_FT} \
    --num_train_epochs 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \

修改了数据集路径：

_init_.py:

from .dataset_config import *

NaturalCap0 = [ShareGPT4V0]
NaturalCap = [ShareGPT4V]
MyDataset = [VITA]

DataConfig = {
    "Pretrain_video": MyDataset,
}

NoPatchSets = ["khair", "jester"]

dataset_config.py:（此处FolderDict是否需要修改？）

AudioFolder = "<mypath>/audio"
FolderDict = {
    #### NaturalCap
    "sharegpt4": "",
}
#### NaturalCap
ShareGPT4V = {"chat_path": ""}
ShareGPT4V0 = {"chat_path": ""}
VITA = {"chat_path": "<mypath>/train_data.json"}

数据形式为多轮对话，其中human输出对应为audio文件，assistant为文本输出

[
    ...
    {
        "set": "sharegpt4",
        "id": "000000000164",
        "conversations": [
            {
                "from": "human",
                "value": "<audio>\n"
            },
            {
                "from": "gpt",  // follow the setting of llave, "gpt" is only used to indicate that this is the ground truth of the model output
                "value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
            },
            {
                "from": "human",
                "value": "<audio>\n"
            },
            {
                "from": "gpt",  // follow the setting of llave, "gpt" is only used to indicate that this is the ground truth of the model output
                "value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
            }
        ],
        "audio": [
            "<mypath>/01.wav",
            "<mypath>/02.wav"
        ]
    },
    ...
]

训练过程中观察到一些warnning信息，不知是否正常：

The text was updated successfully, but these errors were encountered:

linhaojia13 · 2025-02-08T12:00:28Z

您好，出现这个warning一般是input_ids对应的target没设置好导致的，这会造成loss为0。可以从waring产生的代码往前回溯进行debug。

Vincentwei1021 · 2025-02-10T03:50:56Z

您好，出现这个warning一般是input_ids对应的target没设置好导致的，这会造成loss为0。可以从waring产生的代码往前回溯进行debug。

请问这里的target指的是？

linhaojia13 · 2025-02-12T06:53:15Z

您好，出现这个warning一般是input_ids对应的target没设置好导致的，这会造成loss为0。可以从waring产生的代码往前回溯进行debug。

请问这里的target指的是？

在产生warning的那个函数里

ranck626 · 2025-02-19T06:38:52Z

@Vincentwei1021 请问您解决了吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VITA1.5微调loss为0 #103

VITA1.5微调loss为0 #103

Vincentwei1021 commented Feb 8, 2025 •

edited

Loading

linhaojia13 commented Feb 8, 2025

Vincentwei1021 commented Feb 10, 2025

linhaojia13 commented Feb 12, 2025

ranck626 commented Feb 19, 2025

VITA1.5微调loss为0 #103

VITA1.5微调loss为0 #103

Comments

Vincentwei1021 commented Feb 8, 2025 • edited Loading

linhaojia13 commented Feb 8, 2025

Vincentwei1021 commented Feb 10, 2025

linhaojia13 commented Feb 12, 2025

ranck626 commented Feb 19, 2025

Vincentwei1021 commented Feb 8, 2025 •

edited

Loading