pytorch Error in Sync batch norm #211

krishnakanthnakka · 2019-07-09T11:02:30Z

CUDA =9.2 , GCC -6.0

Traceback (most recent call last):
File "experiments/segmentation/demo.py", line 16, in
output = model.evaluate(img)
File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/models/base.py", line 78, in evaluate
pred = self.forward(x)
File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/models/fcn.py", line 51, in forward
_, _, c3, c4 = self.base_forward(x)
File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/models/base.py", line 67, in base_forward
x = self.pretrained.conv1(x)
File "/home/nakka/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/nakka/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/nakka/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 122, in forward
self.activation, self.slope).view(input_shape)
File "/cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/functions/syncbn.py", line 95, in forward
y = lib.gpu.batchnorm_forward(x, _ex, _exs, gamma, beta, ctx.eps)
RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:289, please report a bug to PyTorch. (BatchNorm_Forward_CUDA at /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:289)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f4d5b06dfe1 in /home/nakka/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f4d5b06ddfa in /home/nakka/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: BatchNorm_Forward_CUDA(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, float) + 0x2c7 (0x7f4d471d5788 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #3: + 0x6fb5e (0x7f4d471afb5e in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #4: + 0x6a4f5 (0x7f4d471aa4f5 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #5: + 0x62ce9 (0x7f4d471a2ce9 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #6: + 0x63004 (0x7f4d471a3004 in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #7: + 0x5192c (0x7f4d4719192c in /cvlabdata2/home/krishna/packages/conda2/envs/py3.6/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)

frame #16: THPFunction_apply(_object, _object) + 0x581 (0x7f4d55c374d1 in /home/nakka/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

zhanghang1989 · 2019-07-09T21:08:02Z

Could you try install CUDA 10.1 and reinstall pytorch?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch Error in Sync batch norm #211

pytorch Error in Sync batch norm #211

krishnakanthnakka commented Jul 9, 2019

zhanghang1989 commented Jul 9, 2019

pytorch Error in Sync batch norm #211

pytorch Error in Sync batch norm #211

Comments

krishnakanthnakka commented Jul 9, 2019

zhanghang1989 commented Jul 9, 2019