network issue #10

adeschemps · 2020-06-11T13:46:35Z

I am using the nvidia docker container for pytorch-1912. I can clone the github repository without any problem, but when I try to run CC-FPSE on my own data (on a 4 GPU instance) :

python train.py --name condconv --netG condconv --netD fpse --lambda_feat 20 --dataset_mode custom --label_dir mydata/train_label --image_dir mydata/train_img --label_nc 6 --no_instance --batchSize 1 --niter 100 --niter_decay 100 --use_vae --ngpus_per_node 4

I get the following error :

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/uge_mnt/home/adeschem/CC-FPSE/train.py", line 37, in main_worker
dist.init_process_group(backend='nccl', init_method=opt.dist_url, world_size=world_size, rank=rank)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 397, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 109, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Network is unreachable

This seems to be related to torch distributed communication package, eventhough I am not using the --mpdist option to use distributed multiprocessing.

kei97103 · 2020-07-10T13:40:31Z

This code using 'torch.distributed'.
unfortunately, 'torch.distributed' does not support windows.
pytorch/pytorch#37068

I made some change for using at windows.
https://github.com/kei97103/CC-FPSE

It works well on my enviroment... But I'm not sure this code works well in other enviroment.

adeschemps · 2020-07-10T13:48:29Z

Thanks for your answer, I'll post an update when I get around to giving it a try to tell you if it works on my end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network issue #10

network issue #10

adeschemps commented Jun 11, 2020

kei97103 commented Jul 10, 2020 •

edited

Loading

adeschemps commented Jul 10, 2020

network issue #10

network issue #10

Comments

adeschemps commented Jun 11, 2020

kei97103 commented Jul 10, 2020 • edited Loading

adeschemps commented Jul 10, 2020

kei97103 commented Jul 10, 2020 •

edited

Loading