Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_one_epoch中似乎没有进行反向传播更新梯度,请问是哪一行更新了梯度呢 #11

Open
Yang-bug-star opened this issue Mar 2, 2024 · 4 comments

Comments

@Yang-bug-star
Copy link

No description provided.

@shansongliu
Copy link
Owner

shansongliu commented Mar 2, 2024

It's in loss_scaler function call at line 69 in engine_train.py, which points to misc.py in util/ folder. You will see the backward call in the class NativeScalerWithGradNormCount at line 255 in misc.py

@Yang-bug-star
Copy link
Author

thank you very much

@Yang-bug-star
Copy link
Author

Then, I also want to ask why the number of training epochs in the code is different from that in the paper. In the paper, it is 5,5,2, but in the training script, it is 5,2,2. In addition, it is noticed that min_lr=1e-2 is greater than lr=1e-5, which leads to the increase of learning rate in adjust_lr_rate process with the number of iteration rounds. Is this the normal phenomena ? However, lr is set to 1e-4 in the paper. May I ask what is the appropriate learning rate?

@crypto-code
Copy link
Collaborator

Then, I also want to ask why the number of training epochs in the code is different from that in the paper. In the paper, it is 5,5,2, but in the training script, it is 5,2,2. In addition, it is noticed that min_lr=1e-2 is greater than lr=1e-5, which leads to the increase of learning rate in adjust_lr_rate process with the number of iteration rounds. Is this the normal phenomena ? However, lr is set to 1e-4 in the paper. May I ask what is the appropriate learning rate?

The code was uploaded from part of our experiments when we were testing the model with different configurations. The configuration that gave us the best results was lr=1e-4 with 5,5 and 2 epochs by stage. We will set the hyper-parameters to the paper's configuration. Thank you for bringing this to our attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants