-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 #16612
Comments
Looks like it could be the commit after 91bb398 |
Was able to build it successfully for ef56334
[ ✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✖ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✔ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP] Something off coz of NCCL? |
Its probably because of the gcc version not supporting |
My earlier theory related to
Since CI passed without issues and my local build also passed with g++ 7.4 (ubuntu 18.04, cuda 10.1) I am suspecting some issue with your setup. Can you omit ccache and run the build directly? Did you do the submodule update? |
Could be related to nvvc:
My environment is here: https://github.com/larroy/ec2_launch_scripts Tried gcc 6 and happened as well. Also without ccache. The instance is pretty clean. |
I have removed the "Bug" label for now. This requires more evidence to be classified as a "MXNet" bug. I have requested an AMI from @larroy offline to reproduce the issue on the specific version of nvcc. Currently, I have tried to build with CUDA 10.0 and CUDA 10.1 and haven't been able to reproduce on ubuntu 18.04 and g++ 7.4. Also, this issue wasn't reproduced on our supported environments checked by CI. |
Hi @anirudh2290 you can repro the environment in AWS Ec with the following: https://github.com/larroy/ec2_launch_scripts Just execute launch.py all the environment is coded and coming from NVidia. Let me know if you have any issues. Would be great if we could fix this issue. |
Hi @larroy, I currently don't have the time to debug custom scripts and custom environments. You provided the CUDA version gcc version ubuntu version. I tried with this configuration for cmake build and haven't been able to reproduce the issue. Also, the issue has not been reproduced in the CI builds. With the current evidence, I highly suspect the issue to be specific to something happening in your environment. Having said that, I can continue my work even if #16526 is reverted, though it may cause slightly additional work for frontend developers, developing on top of #16654 . So, if you can convince a committer about this revert, i won't be blocking it. Also, if this is going to be reverted, a CI stage should be added in the future, which would make #16526 fail the build. |
Thanks for your help @anirudh2290 I think this is could be a bug with the NVCC that comes with Cuda 10.1.105 as seems to work with 10.1.243. |
I got a similar issue with ArchLinux, cuda 10.1.243, gcc 8.3.0, opencv 4.1.2. Here is my build log. |
@hubutui Looks like your issue is unrelated. I don't see issue related to ThreadLocalStore in your log. |
Yes, I believe this is a problem present in the original cuda 10.1 release (10.1.105), fixed by 10.1 Update 1 (10.1.168). Are you able to upgrade at least to this version, or are we looking for a work-around for 10.1.105? |
I was able to upgrade and the problem went away with the updated CUDA. |
And FYI, if you feel it worth trying to correct this for MXNet users on the original cuda 10.1, the fix to the problematic line in dmlc-core is:
Worth a PR? |
I think it would be user friendly to avoid obscure compilation errors for users if we can avoid it. Meaning I think it would be best to add a PR. |
I agree, it would be worth opening a PR to dmlc-core. Thanks @DickJC123 ! |
Description
To Reproduce
Building with the following config:
rev: ef56334...
./dev_menu.py build
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
http://ix.io/1ZL6
The text was updated successfully, but these errors were encountered: