Runtime Error #26

emigmo · 2018-03-16T09:48:22Z

python main.py --dataset minc --model deepten --batch-size 64 --lr 0.01 --epochs 60

/media/data_5t/yc/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 1 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
Using step LR Scheduler!
: 0%| | 0/764 [00:00<?, ?it/s]
=>Epoches 1, learning rate = 0.0100, previous best = 100.0000
Traceback (most recent call last):
File "main.py", line 171, in
main()
File "main.py", line 157, in main
train(epoch)
File "main.py", line 92, in train
loss.backward()
File "/media/data_5t/yc/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/variable.py", line 120, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/media/data_5t/yc/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/init.py", line 81, in backward
variables, grad_variables, retain_graph, create_graph)
RuntimeError: cublas runtime error : the GPU program failed to execute at /media/data_5t/yc/pytorch/aten/src/THC/THCBlas.cu:249

my env:
Anaconda, python 3.6
cuda 8.0, pytorch has installed from source and current version is 0.4. torchvision also installs from source (0.2), ubuntu 16.04.

The text was updated successfully, but these errors were encountered:

zhanghang1989 · 2018-03-21T16:45:10Z

Please use CUDA_VISIBLE_DEVICES=0,1 python main.py ...

zhanghang1989 added the question label Mar 21, 2018

zhanghang1989 closed this as completed Apr 11, 2018

roseif mentioned this issue Aug 14, 2020

What is the reason for this problem during training？ #312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime Error #26

Runtime Error #26

emigmo commented Mar 16, 2018

zhanghang1989 commented Mar 21, 2018

Runtime Error #26

Runtime Error #26

Comments

emigmo commented Mar 16, 2018

zhanghang1989 commented Mar 21, 2018