There are discrepancies between the parameters in the paper and those in main.py. #9

LiJiaBei-7 · 2024-09-25T07:33:14Z

I noticed in the supplementary material that the number of steps is 50,000, but in main.py, steps_per_epoch=500. I would like to ask if this is a mistake? Additionally, the batch_size and gradient accumulation are also different from what was used in the paper.

The text was updated successfully, but these errors were encountered:

Vladimir2506 · 2024-09-25T07:53:55Z

@LiJiaBei-7 Thank you so much for pointing this issue out, please set the hyper-parameters following the paper. I modified the default parameters when testing the maximum batch size on 80GB GPUs. I will update a new version of them along with the debug of your next issue on poor training results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There are discrepancies between the parameters in the paper and those in main.py. #9

There are discrepancies between the parameters in the paper and those in main.py. #9

LiJiaBei-7 commented Sep 25, 2024

Vladimir2506 commented Sep 25, 2024

There are discrepancies between the parameters in the paper and those in main.py. #9

There are discrepancies between the parameters in the paper and those in main.py. #9

Comments

LiJiaBei-7 commented Sep 25, 2024

Vladimir2506 commented Sep 25, 2024