-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training #42
Comments
It seems you have to set some environment variables, see: https://pytorch.org/docs/0.3.0/distributed.html#environment-variable-initialization |
I have a same problem with you. But, I success to run the evaluation and it works very well in evaluation. |
@serycjon |
I don't have the training data ready, so haven't really tested it, but running the script with: |
if I : WORLD_SIZE=1 RANK=0 PYTHONPATH=./:$PYTHONPATH python tools/train.py --cfg experiments/siamrpn_r50_l234_dwxcorr/config.yaml then: Traceback (most recent call last): |
@chenbolinstudent ,可以发下解决了RANK问题的修改的代码的截图吗, PYTHONPATH=./:$PYTHONPATH python tools/train.py,这个不是命令行语句吗,怎样在代码中修改呢 |
@chenbolinstudent Well, you have to install pysot correctly first (see install.md). |
I found a problem, the training time is very long, if the training process is interrupted midway, then you need to start training from the beginning, is there a solution to continue training in the interrupted position,just like SiamFC‘s training process? |
Try to use TRAIN.RESUME to resume your training. |
[2019-06-22 09:28:08,844-rk0-model_load.py# 42] remove prefix 'module.' |
I can debug training code on pycharm,you can refer to this blog,https://oldpan.me/archives/pytorch-to-use-multiple-gpus,but follow his configuration,you would receive an error: no module train.py,you just change 'train.py' to 'train',and it works |
@ZZXin |
but this blog do not say how to set the nproc_per_node master_port |
I train this code on single GPU,I think it doesn't matter,you can set '--nproc_per_node master_port = 1' |
在哪个.py文件设置? |
which .py file to set |
train.py |
--nproc_per_node |
I don't have it, I am waiting for pysot‘s downloading link |
how do you debug training code on pycharm |
|
can I ask where the configs in the picture is? I cann't find the window. |
scipts: scipts.paramters: still KeyError: 'RANK' |
Has the issue been solved? I have the same issue. |
Have you solved this problem? |
pycharm Run->Import configurations->Configuration->module name :(torch .distributed.launch) |
@ZZXin |
@chenbolinstudent module name ! not scipts path .have a try |
@xiaotian3 |
|
|
I set it and do not change the train.py scipt,but still KeyError: 'RANK' |
I have a question whether to run train or distributed in pycharm? |
|
@xiaotian3 |
|
@xiaotian3 |
@xiaotian3 |
直接用train,前面的路径,可以参考我的图片 |
@xiaotian3 麻烦把你pycharm 中配置的parameters点开让我看一下 /media/wyl/01937159-8963-47f6-b239-efe7b18f2e8b/wyl/data/software/conda/envs/pysot/bin/python3.7 -m torch.distributed.launch --nproc_per_node 1 --master_port=2333 /media/wyl/01937159-8963-47f6-b239-efe7b18f2e8b/wyl/data/project/pysot/pysot-master/tools/train --cfg /media/wyl/01937159-8963-47f6-b239-efe7b18f2e8b/wyl/data/project/pysot/pysot-master/experiments/siamrpn_r50_l234_dwxcorr_8gpu/config.yaml |
@xiaotian3 期待你的回复,谢谢! |
|
好的 |
@Programmerwyl |
@chenbolinstudent 我的是专业版的。 |
Thanks @Programmerwyl ,I make it . When debugging,use train , Not train.py . |
十分感谢,我也做到了 |
由于本issue没有得到更进一步的提问,我现在将其关闭。 |
请问如何不通过命令行运行train.py?目前我直接在运行test.py 可以,但是直接运行train.py:
File "/home/public/anaconda3/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
请问如何在代码中设置:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m torch.distributed.launch
--nproc_per_node=8
--master_port=2333
../../tools/train.py --cfg config.yaml
中的值,使得可以直接在pycharm运行train.py,
谢谢您的回复。
The text was updated successfully, but these errors were encountered: