FlanT5 training and zero tensors #1339

GenVr · 2023-05-19T16:34:11Z

Hi, I'm training a FlanT5 network. The training completes successfully, but when I try to run a simple inference, I have a tensor of zeros, so the prediction is null.

Example:

tokenizer = AutoTokenizer.from_pretrained(path, use_fast=False)
model = T5ForConditionalGeneration.from_pretrained(path, low_cpu_mem_usage=True, torch_dtype=torch.float16).cuda()

tokenized_text = tokenizer(query, return_tensors="pt")

source_ids = tokenized_text["input_ids"].to(device, dtype=torch.long)

generated_ids = model.generate(input_ids=source_ids)

Output:

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
       device='cuda:0')

I tried to run several trainings, both on flanT5-xl and flanT5-large, both on my personal dataset and a dummy.json dataset.

That's my training configuration:

!python3 -m torch.distributed.run --nproc_per_node=6 fastchat/train/train_flant5.py \
    --model_name_or_path google/flan-t5-xl \
    --data_path playground/data/dummy.json \
    --fp16 True \
    --output_dir ./output \
    --num_train_epochs 5 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 99999 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp_transformer_layer_cls_to_wrap T5Block \
    --tf32 False \
    --fsdp "full_shard auto_wrap" \
    --model_max_length 256 \
    --gradient_checkpointing True \
    --preprocessed_path ./preprocessed_data/processed.json

Any idea what's going on? Thank you.

The text was updated successfully, but these errors were encountered:

merrymercy · 2023-05-20T13:28:12Z

cc @DachengLi1

DachengLi1 · 2023-05-21T18:31:03Z

@GenVr This is likely because Pytorch FSDP saves t5 model incorrectly (if you print out the loaded model weight, the encoder embedding or decoder embedding is likely 0, which causing the final predictions to be all 0). Can you try using our postprocessing function? There is another issue on this solving the same problem. Let me know if it works!

GenVr · 2023-05-22T13:44:27Z

@DachengLi1 Thanks, I trained on GPUs with more memory and used the function after the training. I am able to load the model correctly. Now, I have another problem. During the training, I have a zero loss and learning rate:

{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.04}                              
...                         
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.32}                              
...                                                   
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.67}                              
...                        
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 1.02} 
...

It seems that the network can't learn anything. My configurations are written in the initial post. I trained both on the dummy.json dataset and on a personal one, having the same results. Do you have any idea about it? Thanks.

DachengLi1 · 2023-05-22T14:39:00Z

@GenVr I met a similar issue that learning rate is 0 in a small dataset. This is because some integer flooring behavior in the huggingface transformer. Can you try warmup ratio =0 (or not give this argument), and let me know what happens?

GenVr · 2023-05-22T15:38:03Z

@DachengLi1 Thanks. I tried removing --warmup_ratio 0.03, I had this:

...                          
{'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.07}
...                                                                            
{'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.31}
...                                                   
{'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.52}
...

Now I have the LR, but the loss is always zero.
Changing the batch size to 1, I can see the loss sometimes different from zero.
I tried to change the learning rate to 1e3 but after the first epoch the situation remains the same.

DachengLi1 · 2023-05-22T17:38:55Z

@GenVr Nice to hear that! Let's keep bs=1 for now, I will look into whether bs>1 can cause other problems (Haven't really tested bs>1 because of the GPU memory limit). Can you try bs=1 on your own dataset? The dummy dataset is composed of very simple questions (if you look into it, a lot of them are very similar), so you probably want to see whether this still happens in a more complex dataset.

DachengLi1 · 2023-05-22T17:39:48Z

BTW, remember to change the preprocessed path, otherwise it will read from the file.

GenVr · 2023-05-23T14:01:32Z

@DachengLi1 Thanks. I tried both with BS equal to 1 and greater than one, with my personal dataset. The loss is always zero and it seems the network fails to train (looks untrained). I could try maybe a big public .json dataset to see what happens (?)

DachengLi1 · 2023-05-23T14:16:28Z

@GenVr Interesting.. I haven't seen this before, could you print an input/target tensor before it goes into the trainer to see what are the contents? Is the data processed in a wrong way?

emnlpanon · 2023-06-01T08:48:12Z

same problem (0 loss from start to finish), both dummy.json and my own dataset

richagadgil · 2023-07-10T22:25:28Z

Was this resolved? same problem with a 0 loss.

leng-yue · 2023-09-20T05:07:09Z

same problem

jxmorris12 · 2023-09-22T22:32:16Z

same problem

DachengLi1 self-assigned this May 20, 2023

DachengLi1 mentioned this issue May 21, 2023

FastChat-T5 doc+fix data processing #1430

Merged

3 tasks

shibing624 mentioned this issue Jun 14, 2023

loss is 0 when turn off use_peft shibing624/MedicalGPT#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlanT5 training and zero tensors #1339

FlanT5 training and zero tensors #1339

GenVr commented May 19, 2023

merrymercy commented May 20, 2023

DachengLi1 commented May 21, 2023

GenVr commented May 22, 2023

DachengLi1 commented May 22, 2023

GenVr commented May 22, 2023

DachengLi1 commented May 22, 2023

DachengLi1 commented May 22, 2023

GenVr commented May 23, 2023

DachengLi1 commented May 23, 2023

emnlpanon commented Jun 1, 2023

richagadgil commented Jul 10, 2023 •

edited

Loading

leng-yue commented Sep 20, 2023

jxmorris12 commented Sep 22, 2023

FlanT5 training and zero tensors #1339

FlanT5 training and zero tensors #1339

Comments

GenVr commented May 19, 2023

merrymercy commented May 20, 2023

DachengLi1 commented May 21, 2023

GenVr commented May 22, 2023

DachengLi1 commented May 22, 2023

GenVr commented May 22, 2023

DachengLi1 commented May 22, 2023

DachengLi1 commented May 22, 2023

GenVr commented May 23, 2023

DachengLi1 commented May 23, 2023

emnlpanon commented Jun 1, 2023

richagadgil commented Jul 10, 2023 • edited Loading

leng-yue commented Sep 20, 2023

jxmorris12 commented Sep 22, 2023

richagadgil commented Jul 10, 2023 •

edited

Loading