Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network issue #10

Open
adeschemps opened this issue Jun 11, 2020 · 2 comments
Open

network issue #10

adeschemps opened this issue Jun 11, 2020 · 2 comments

Comments

@adeschemps
Copy link

I am using the nvidia docker container for pytorch-1912. I can clone the github repository without any problem, but when I try to run CC-FPSE on my own data (on a 4 GPU instance) :

python train.py --name condconv --netG condconv --netD fpse --lambda_feat 20 --dataset_mode custom --label_dir mydata/train_label --image_dir mydata/train_img --label_nc 6 --no_instance --batchSize 1 --niter 100 --niter_decay 100 --use_vae --ngpus_per_node 4

I get the following error :

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/uge_mnt/home/adeschem/CC-FPSE/train.py", line 37, in main_worker
dist.init_process_group(backend='nccl', init_method=opt.dist_url, world_size=world_size, rank=rank)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 397, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 109, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Network is unreachable

This seems to be related to torch distributed communication package, eventhough I am not using the --mpdist option to use distributed multiprocessing.

@kei97103
Copy link

kei97103 commented Jul 10, 2020

This code using 'torch.distributed'.
unfortunately, 'torch.distributed' does not support windows.
pytorch/pytorch#37068

I made some change for using at windows.
https://github.com/kei97103/CC-FPSE

It works well on my enviroment... But I'm not sure this code works well in other enviroment.

@adeschemps
Copy link
Author

Thanks for your answer, I'll post an update when I get around to giving it a try to tell you if it works on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants