We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I use peft to train bloom.Everything is OK. If I turn off use_peft using the following scripts,loss is zero(both llama and bloom):
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 pretraining.py --model_type bloom --model_name_or_path /home/bmb/models/bigscience/bloom-560m --train_file_dir ../data/pretrain --validation_file_dir ../data/pretrain --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --do_train --do_eval --use_peft False --seed 42 --bf16 True --tf32 True --learning_rate 1e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --max_train_samples 10000 --max_eval_samples 10 --num_train_epochs 0.5 --logging_strategy steps --logging_steps 1 --eval_steps 50 --evaluation_strategy steps --save_steps 500 --save_strategy steps --save_total_limit 3 --gradient_accumulation_steps 1 --preprocessing_num_workers 1 --block_size 1024 --output_dir outputs-pt-v1 --overwrite_output_dir --ddp_timeout 30000 --logging_first_step True --target_modules all --lora_rank 8 --lora_alpha 16 --lora_dropout 0.05 --torch_dtype float16 --device_map auto --report_to tensorboard --ddp_find_unused_parameters False --gradient_checkpointing True
{'loss': 1.9695, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.0} {'loss': 0.0, 'learning_rate': 2.8571428571428573e-06, 'epoch': 0.0} {'loss': 0.0, 'learning_rate': 4.2857142857142855e-06, 'epoch': 0.01} {'loss': 0.0, 'learning_rate': 5.7142857142857145e-06, 'epoch': 0.01} {'loss': 0.0, 'learning_rate': 7.1428571428571436e-06, 'epoch': 0.01} {'loss': 0.0, 'learning_rate': 8.571428571428571e-06, 'epoch': 0.01} {'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.02} {'loss': 0.0, 'learning_rate': 9.999370638369377e-06, 'epoch': 0.02} {'loss': 0.0, 'learning_rate': 9.997482711915926e-06, 'epoch': 0.02} {'loss': 0.0, 'learning_rate': 9.994336695915041e-06, 'epoch': 0.02} {'loss': 0.0, 'learning_rate': 9.989933382359423e-06, 'epoch': 0.03} {'loss': 0.0, 'learning_rate': 9.984273879759713e-06, 'epoch': 0.03} {'loss': 0.0, 'learning_rate': 9.977359612865424e-06, 'epoch': 0.03} {'loss': 0.0, 'learning_rate': 9.969192322306271e-06, 'epoch': 0.03} {'loss': 0.0, 'learning_rate': 9.959774064153977e-06, 'epoch': 0.04} {'loss': 0.0, 'learning_rate': 9.949107209404664e-06, 'epoch': 0.04}
The text was updated successfully, but these errors were encountered:
I use RTX 4090,If i use float16,there is an error:"ValueError: Attempting to unscale FP16 gradients." so I change the parames:--bf16 True,--tf32 True
Sorry, something went wrong.
fp16 and bf16 maybe get 0 loss, please set float32, or you can try set batchsize=1 refer: lm-sys/FastChat#1339
No branches or pull requests
Describe the Question
When I use peft to train bloom.Everything is OK.
If I turn off use_peft using the following scripts,loss is zero(both llama and bloom):
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 pretraining.py
--model_type bloom
--model_name_or_path /home/bmb/models/bigscience/bloom-560m
--train_file_dir ../data/pretrain
--validation_file_dir ../data/pretrain
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--do_train
--do_eval
--use_peft False
--seed 42
--bf16 True
--tf32 True
--learning_rate 1e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--max_train_samples 10000
--max_eval_samples 10
--num_train_epochs 0.5
--logging_strategy steps
--logging_steps 1
--eval_steps 50
--evaluation_strategy steps
--save_steps 500
--save_strategy steps
--save_total_limit 3
--gradient_accumulation_steps 1
--preprocessing_num_workers 1
--block_size 1024
--output_dir outputs-pt-v1
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--device_map auto
--report_to tensorboard
--ddp_find_unused_parameters False
--gradient_checkpointing True
{'loss': 1.9695, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 2.8571428571428573e-06, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 4.2857142857142855e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 5.7142857142857145e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 7.1428571428571436e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 8.571428571428571e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 9.999370638369377e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 9.997482711915926e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 9.994336695915041e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 9.989933382359423e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.984273879759713e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.977359612865424e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.969192322306271e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.959774064153977e-06, 'epoch': 0.04}
{'loss': 0.0, 'learning_rate': 9.949107209404664e-06, 'epoch': 0.04}
The text was updated successfully, but these errors were encountered: