Inference on K40m GPU #58

goyalsaransh97 · 2019-05-22T04:28:38Z

I tried running inference for pre-trained pythia mdoel on K40m. It didn't start for quite some time and then the ETA was oscillating around 10-15 hours.
So I enabled multi-GPU training using dataparallel flag. But now its not starting, I've waited for around 30 minutes before stopping it. I got the following errors on stopping:
"
File "/mnt/data_g/saransh/anaconda3/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
".
I've tried re-running it but the problem persists. Earlier I tried it on a V100, it was working fine on that.
Could you suggest something?

apsdehal · 2019-05-22T04:36:04Z

Hard to say without testing on k40m. But for now can you try setting training_parameters.num_workers to 0?

goyalsaransh97 · 2019-05-22T05:03:27Z

The problem persists after adding the numworkers flag.

apsdehal · 2019-05-22T19:05:23Z

Following https://bugs.launchpad.net/designate/+bug/1782647, it seems like a bug with 3.7, can you try with 3.6?

apsdehal · 2019-05-22T19:10:46Z

Related: pytorch/pytorch#8388

goyalsaransh97 · 2019-05-23T01:22:19Z

Thanks for your help.

#58) - Fixes support for exclude list and allows it to be passed as args - Add support for feature extraction from png and jpeg files - Add "cls_prob", class probabilities field back to output_dict

goyalsaransh97 closed this as completed May 23, 2019

ChenyuGAO-CS mentioned this issue Jul 3, 2019

How can I train LoRRA on TextVQA dataset using multi-GPUs? #116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference on K40m GPU #58

Inference on K40m GPU #58

goyalsaransh97 commented May 22, 2019

apsdehal commented May 22, 2019 •

edited

Loading

goyalsaransh97 commented May 22, 2019

apsdehal commented May 22, 2019

apsdehal commented May 22, 2019

goyalsaransh97 commented May 23, 2019

Inference on K40m GPU #58

Inference on K40m GPU #58

Comments

goyalsaransh97 commented May 22, 2019

apsdehal commented May 22, 2019 • edited Loading

goyalsaransh97 commented May 22, 2019

apsdehal commented May 22, 2019

apsdehal commented May 22, 2019

goyalsaransh97 commented May 23, 2019

apsdehal commented May 22, 2019 •

edited

Loading