Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问单机两卡(A40)训练stage one ,都需要修改哪些参数呢? #19

Open
Sarah-air opened this issue Aug 14, 2024 · 2 comments

Comments

@Sarah-air
Copy link

命令行改为python -m torch.distributed.launch --nproc_per_node=2 --master_port=21 train.py -opt options/train/GoPro_S1.yml --launcher pytorch为什么报错啊?
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 223240) of binary: /root/anaconda3/envs/hi_diff/bin/python

@zhengchen1999
Copy link
Owner

https://github.com/zhengchen1999/HI-Diff/blob/main/options/train/GoPro_S1.yml#L22

batch_size_per_gpu: 32 (满足总batch=64)

@Sarah-air
Copy link
Author

https://github.com/zhengchen1999/HI-Diff/blob/main/options/train/GoPro_S1.yml#L22

batch_size_per_gpu: 32 (满足总batch=64)

好的,多谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants