Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 #16612

larroy · 2019-10-24T23:06:56Z

Description


[74/524] Building NVCC (Device) object CMakeFiles/cuda_compile_1.dir/src/operator/contrib/cuda_compile_1_generated_bounding_box.cu.o
FAILED: CMakeFiles/cuda_compile_1.dir/src/operator/contrib/cuda_compile_1_generated_bounding_box.cu.o
cd /home/piotr/mxnet_master/build/CMakeFiles/cuda_compile_1.dir/src/operator/contrib && /usr/local/bin/cmake -E make_directory /home/piotr/mxnet_master/build/CMakeFiles/cuda_compile_1.dir/src/operator/contrib/. && /usr/local/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Debug -D generated_file:STRING=/home/piotr/mxnet_master/build/CMakeFiles/cuda_compile_1.dir/src/operator/contrib/./cuda_compile_1_generated_bounding_box.cu.o -D generated_cubin_file:STRING=/home/piotr/mxnet_master/build/CMakeFiles/cuda_compile_1.dir/src/operator/contrib/./cuda_compile_1_generated_bounding_box.cu.o.cubin.txt -P /home/piotr/mxnet_master/build/CMakeFiles/cuda_compile_1.dir/src/operator/contrib/cuda_compile_1_generated_bounding_box.cu.o.Debug.cmake
/home/piotr/mxnet_master/include/dmlc/./thread_local.h: In instantiation of ‘static T* dmlc::ThreadLocalStore<T>::Get() [with T = std::unordered_set<std::__cxx11::basic_string<char> >]’:
/home/piotr/mxnet_master/src/operator/contrib/./../../common/utils.h:461:28:   required from here
/home/piotr/mxnet_master/include/dmlc/./thread_local.h:46:15: error: cannot call member function ‘void dmlc::ThreadLocalStore<T>::RegisterDelete(T*) [with T = std::unordered_set<std::__cxx11::basic_string<char> >]’ without object
       Singleton()->RegisterDelete(ptr);
       ~~~~~~~~^~~~~
CMake Error at cuda_compile_1_generated_bounding_box.cu.o.Debug.cmake:279 (message):
  Error generating file
  /home/piotr/mxnet_master/build/CMakeFiles/cuda_compile_1.dir/src/operator/contrib/./cuda_compile_1_generated_bounding_box.cu.o

To Reproduce

Building with the following config:

rev: ef56334...

./dev_menu.py build

--- # CMake configuration
USE_CUDA: "ON" # Build with CUDA support
USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
USE_NCCL: "ON" # Use NVidia NCCL with CUDA
USE_OPENCV: "ON" # Build with OpenCV support
USE_OPENMP: "PLATFORM" # Build with Openmp support
USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
USE_LAPACK: "ON" # Build with lapack support
USE_MKL_IF_AVAILABLE: "OFF" # Use MKL if found
USE_MKLML_MKL: "OFF" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_MKLDNN: "OFF" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
USE_JEMALLOC: "ON" # Build with Jemalloc support
USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
USE_CPP_PACKAGE: "OFF" # Build C++ Package
USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
USE_GPROF: "OFF" # Compile with gprof (profiling) flag
USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path
ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
INSTALL_EXAMPLES: "OFF" # Install the example source files.
USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
CMAKE_BUILD_TYPE: "Debug"
CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
CMAKE_C_COMPILER_LAUNCHER: "ccache"
CMAKE_CXX_COMPILER_LAUNCHER: "ccache"

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

http://ix.io/1ZL6

The text was updated successfully, but these errors were encountered:

larroy · 2019-10-24T23:25:06Z

Looks like it could be the commit after 91bb398

@anirudh2290

ChaiBapchya · 2019-10-24T23:26:02Z

Was able to build it successfully for ef56334
with following build flags

$ python -c "from mxnet.runtime import feature_list; print(feature_list())"

[ ✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✖ LAPACK, ✖ MKLDNN, ✖ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✔ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP]

Something off coz of NCCL?

anirudh2290 · 2019-10-24T23:29:08Z

Its probably because of the gcc version not supporting __thread construct. Looking into this.

anirudh2290 · 2019-10-25T00:32:17Z

My earlier theory related to __thread was wrong. I am not able to reproduce it with :

mkdir build && cd build && cmake -DVERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON -DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=0 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1 -DCUDNN_ROOT=/usr/local/cuda-10.1 -DUSE_MKLDNN=0 -DUSE_MKL_IF_AVAILABLE=0 -DUSE_MKLML_MKL=0 -DUSE_ASAN=0 -GNinja -DUSE_OPERATOR_TUNING=1 -DUSE_CPP_PACKAGE=ON -DCUDA_ARCH_NAME=Auto -DUSE_INT64_TENSOR_SIZE=OFF -DUSE_TENSORRT=OFF -DUSE_NCCL=ON ..
ninja -v

Since CI passed without issues and my local build also passed with g++ 7.4 (ubuntu 18.04, cuda 10.1) I am suspecting some issue with your setup. Can you omit ccache and run the build directly? Did you do the submodule update?

larroy · 2019-10-25T01:06:41Z

Could be related to nvvc:

piotr@54-198-120-41:0:~/mxnet_master ((ef5633448...))+$ /usr/local/cuda-10.1/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

My environment is here:

https://github.com/larroy/ec2_launch_scripts

Tried gcc 6 and happened as well.

Also without ccache. The instance is pretty clean.

anirudh2290 · 2019-10-25T01:25:19Z

I have removed the "Bug" label for now. This requires more evidence to be classified as a "MXNet" bug. I have requested an AMI from @larroy offline to reproduce the issue on the specific version of nvcc. Currently, I have tried to build with CUDA 10.0 and CUDA 10.1 and haven't been able to reproduce on ubuntu 18.04 and g++ 7.4. Also, this issue wasn't reproduced on our supported environments checked by CI.

larroy · 2019-10-30T22:20:41Z

Hi @anirudh2290 you can repro the environment in AWS Ec with the following:

https://github.com/larroy/ec2_launch_scripts

Just execute launch.py all the environment is coded and coming from NVidia. Let me know if you have any issues.

Would be great if we could fix this issue.

anirudh2290 · 2019-10-31T03:20:08Z

Hi @larroy, I currently don't have the time to debug custom scripts and custom environments. You provided the CUDA version gcc version ubuntu version. I tried with this configuration for cmake build and haven't been able to reproduce the issue. Also, the issue has not been reproduced in the CI builds. With the current evidence, I highly suspect the issue to be specific to something happening in your environment.

Having said that, I can continue my work even if #16526 is reverted, though it may cause slightly additional work for frontend developers, developing on top of #16654 . So, if you can convince a committer about this revert, i won't be blocking it. Also, if this is going to be reverted, a CI stage should be added in the future, which would make #16526 fail the build.

larroy · 2019-11-01T18:57:51Z

Thanks for your help @anirudh2290 I think this is could be a bug with the NVCC that comes with Cuda 10.1.105 as seems to work with 10.1.243.

hubutui · 2019-11-04T12:10:27Z

I got a similar issue with ArchLinux, cuda 10.1.243, gcc 8.3.0, opencv 4.1.2. Here is my build log.

mxnet-buildlog.txt

anirudh2290 · 2019-11-04T16:12:18Z

@hubutui Looks like your issue is unrelated. I don't see issue related to ThreadLocalStore in your log.

DickJC123 · 2019-11-04T21:25:55Z

Yes, I believe this is a problem present in the original cuda 10.1 release (10.1.105), fixed by 10.1 Update 1 (10.1.168). Are you able to upgrade at least to this version, or are we looking for a work-around for 10.1.105?

larroy · 2019-11-04T23:10:58Z

I was able to upgrade and the problem went away with the updated CUDA.

DickJC123 · 2019-11-05T00:50:27Z

And FYI, if you feel it worth trying to correct this for MXNet users on the original cuda 10.1, the fix to the problematic line in dmlc-core is:

      // nvcc fails to compile 'Singleton()->' on first cuda 10.1 release, fixed with update 1.
      (*Singleton()).RegisterDelete(ptr);

Worth a PR?

larroy · 2019-11-05T00:54:09Z

I think it would be user friendly to avoid obscure compilation errors for users if we can avoid it. Meaning I think it would be best to add a PR.

anirudh2290 · 2019-11-05T01:48:45Z

I agree, it would be worth opening a PR to dmlc-core. Thanks @DickJC123 !

larroy added the Bug label Oct 24, 2019

larroy changed the title ~~Compilation fails in master~~ Compilation fails in master Cuda 10.1 GCC 7.4 Ubuntu 18.04 Oct 24, 2019

anirudh2290 added Build and removed Bug labels Oct 25, 2019

marcoabreu added the Pending Requester Info label Oct 27, 2019

larroy changed the title ~~Compilation fails in master Cuda 10.1 GCC 7.4 Ubuntu 18.04~~ Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 Nov 4, 2019

This was referenced Nov 5, 2019

Fix initial-release nvcc v10.1 issue with thread_local.h dmlc/dmlc-core#579

Merged

Update submodule dmlc-core #16742

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 #16612

Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 #16612

larroy commented Oct 24, 2019

larroy commented Oct 24, 2019

ChaiBapchya commented Oct 24, 2019 •

edited

Loading

anirudh2290 commented Oct 24, 2019

anirudh2290 commented Oct 25, 2019 •

edited

Loading

larroy commented Oct 25, 2019 •

edited

Loading

anirudh2290 commented Oct 25, 2019

larroy commented Oct 30, 2019

anirudh2290 commented Oct 31, 2019

larroy commented Nov 1, 2019 •

edited

Loading

hubutui commented Nov 4, 2019

anirudh2290 commented Nov 4, 2019

DickJC123 commented Nov 4, 2019

larroy commented Nov 4, 2019

DickJC123 commented Nov 5, 2019

larroy commented Nov 5, 2019 •

edited

Loading

anirudh2290 commented Nov 5, 2019

Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 #16612

Compilation fails in master Cuda 10.1.105 GCC 7.4 Ubuntu 18.04 #16612

Comments

larroy commented Oct 24, 2019

Description

To Reproduce

Steps to reproduce

What have you tried to solve it?

Environment

larroy commented Oct 24, 2019

ChaiBapchya commented Oct 24, 2019 • edited Loading

anirudh2290 commented Oct 24, 2019

anirudh2290 commented Oct 25, 2019 • edited Loading

larroy commented Oct 25, 2019 • edited Loading

anirudh2290 commented Oct 25, 2019

larroy commented Oct 30, 2019

anirudh2290 commented Oct 31, 2019

larroy commented Nov 1, 2019 • edited Loading

hubutui commented Nov 4, 2019

anirudh2290 commented Nov 4, 2019

DickJC123 commented Nov 4, 2019

larroy commented Nov 4, 2019

DickJC123 commented Nov 5, 2019

larroy commented Nov 5, 2019 • edited Loading

anirudh2290 commented Nov 5, 2019

ChaiBapchya commented Oct 24, 2019 •

edited

Loading

anirudh2290 commented Oct 25, 2019 •

edited

Loading

larroy commented Oct 25, 2019 •

edited

Loading

larroy commented Nov 1, 2019 •

edited

Loading

larroy commented Nov 5, 2019 •

edited

Loading