-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
it seems like the sync-batchnorm (in syncbn_kernel.cu) can't match the pytorch1.0.0 ? #168
Comments
Thanks for @huanghoujing 's suggestion, but I wonder what the real cause is. It will be hard to find the clues from a totally new installation process. Indeed, I have been successfully installed torch-encoding on two servers but met the aforementioned issue in a machine with Tesla M40 GPU, kept almost the same software configurations. |
@huanghoujing Sorry for the late reply, I just tried to install pytorch through building from source code and kept all other steps unchanged. The errors no longer exist now. |
It works with PyTorch 1.0.0, but not 1.0.1 |
@zhanghang1989 Does this repo works with PyTorch 1.1 now? |
@zhanghang1989 @qiulesun Same question. Can this be used in PyTorch 1.1 now? |
I am not maintaining the code any more, because I have moved to MXNet development. |
When I try to train or test the model , it seems like code in " syncbn_kernel.cu" can't match the pytorch1.0.0 :(I've installed the cuda9.2, ninja1.8, pytorch1.0.0)
the errors of train mode look like that:
RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:424, please report a bug to PyTorch. (Expectation_Forward_CUDA at /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu:424)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fa21f2a3cc5 in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: Expectation_Forward_CUDA(at::Tensor) + 0x281 (0x7fa2133b2c86 in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #2: + 0x8a6b5 (0x7fa21338b6b5 in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #3: + 0x838f6 (0x7fa2133848f6 in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #4: + 0x7c181 (0x7fa21337d181 in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #5: + 0x7c2ed (0x7fa21337d2ed in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #6: + 0x69872 (0x7fa21336a872 in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/gpu/enclib_gpu.so)
frame #15: THPFunction_apply(_object*, _object*) + 0x5dd (0x7fa24f73c40d in /home/qingqing/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
The text was updated successfully, but these errors were encountered: