Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Training Guidelines #150

Merged
merged 16 commits into from
Aug 24, 2023
Merged
10 changes: 3 additions & 7 deletions scripts/training/run_pt.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users may use the script to pre-train the model from scratch.
But with these modifications, the default behavior of script is continual training chinese-llama-2.
I think it may lead to some confusions.
I would advise keeping modules_to_save.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your suggestion.

Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@ lr=2e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=path/to/hf/llama-2/dir
chinese_tokenizer_path=path/to/chinese/llama-2/tokenizer/dir
pretrained_model=path/to/hf/chinese-llama-2/dir
chinese_tokenizer_path=path/to/chinese/chinese-llama-2/tokenizer/dir
dataset_dir=path/to/pt/data/dir
data_cache=temp_data_cache_dir
per_device_train_batch_size=1
Expand Down Expand Up @@ -46,8 +45,5 @@ torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py \
--lora_rank ${lora_rank} \
--lora_alpha ${lora_alpha} \
--trainable ${lora_trainable} \
--modules_to_save ${modules_to_save} \
--lora_dropout ${lora_dropout} \
--torch_dtype float16 \
--gradient_checkpointing \
--ddp_find_unused_parameters False
--torch_dtype float16
12 changes: 3 additions & 9 deletions scripts/training/run_sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,16 @@ lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=path/to/hf/llama-2/or/merged/llama-2/dir/or/model_id
chinese_tokenizer_path=path/to/chinese/llama-2/tokenizer/dir
pretrained_model=path/to/hf/chinese-alpaca-2/dir/or/model_id
chinese_tokenizer_path=path/to/chinese/chinese-alpaca-2/tokenizer/dir
dataset_dir=path/to/sft/data/dir
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
max_seq_length=512
output_dir=output_dir
peft_model=path/to/peft/model/dir
validation_file=validation_file_name

deepspeed_config_file=ds_zero2_no_offload.json
Expand Down Expand Up @@ -51,10 +49,6 @@ torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \
--lora_rank ${lora_rank} \
--lora_alpha ${lora_alpha} \
--trainable ${lora_trainable} \
--modules_to_save ${modules_to_save} \
--lora_dropout ${lora_dropout} \
--torch_dtype float16 \
--validation_file ${validation_file} \
--peft_path ${peft_model} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain more on the modifications?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peft_path与lora相关训练参数互斥,因此以设置lora可训练参数为例。

--gradient_checkpointing \
--ddp_find_unused_parameters False
--validation_file ${validation_file}