Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

CMake GPU build broken by #16654 #17545

Closed
apeforest opened this issue Feb 7, 2020 · 3 comments · Fixed by #17552
Closed

CMake GPU build broken by #16654 #17545

apeforest opened this issue Feb 7, 2020 · 3 comments · Fixed by #17552
Assignees
Labels
Bug CMake CMake related bugs/issues/improvements Doc

Comments

@apeforest
Copy link
Contributor

Description

The CMake build instruction on ubuntu not working on a ubuntu instance with DLAMI

Error Message

FAILED: : && /usr/bin/c++ -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=c++11 -fopenmp tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/omp_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/thread_local_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/threaded_engine_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/kvstore/gpu_topology_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/libinfo_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/activation_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/batchnorm_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/coreop_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/dropout_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/fully_conn_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/krprod_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_operator_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/runner/core_op_runner_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/slice_channel_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/tune/operator_tune_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/storage/storage_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/thread_safety/thread_safety_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cmake_device_link.o -o tests/mxnet_unit_tests -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/usr/local/cuda/lib64:/usr/local/lib:/home/ubuntu/src/incubator-mxnet/build/3rdparty/openmp/runtime/src lib/libgtest.a -Wl,--whole-archive libmxnet.a -Wl,--no-whole-archive 3rdparty/dmlc-core/libdmlc.a 3rdparty/mkldnn/src/libdnnl.a /usr/local/lib/libopenblas.so /usr/lib/x86_64-linux-gnu/librt.so /usr/local/lib/libopencv_highgui.so.4.0.0 3rdparty/openmp/runtime/src/libomp.so -lpthread -llapack /usr/local/cuda/lib64/libcudnn.so /usr/local/lib/libopencv_videoio.so.4.0.0 /usr/local/lib/libopencv_imgcodecs.so.4.0.0 /usr/local/lib/libopencv_imgproc.so.4.0.0 /usr/local/lib/libopencv_core.so.4.0.0 -lpthread /usr/local/cuda/lib64/libcudart.so /usr/lib/x86_64-linux-gnu/libcublas.so /usr/local/cuda/lib64/libcufft.so /usr/local/cuda/lib64/libcusolver.so /usr/local/cuda/lib64/libcurand.so /usr/local/cuda/lib64/libnvToolsExt.so -ldl -lrt -lcudadevrt -lcudart_static -lrt -lpthread -ldl && :
libmxnet.a(cached_op.cc.o): In function mxnet::CachedOp::CachedOpState::CachedOpState(mxnet::Context const&, nnvm::Graph const&, nnvm::Graph const&, bool)': cached_op.cc:(.text._ZN5mxnet8CachedOp13CachedOpStateC2ERKNS_7ContextERKN4nnvm5GraphES8_b[_ZN5mxnet8CachedOp13CachedOpStateC5ERKNS_7ContextERKN4nnvm5GraphES8_b]+0x6e9): undefined reference to mxnet::exec::FusePointwiseForward(nnvm::Graph&&)'
cached_op.cc:(.text._ZN5mxnet8CachedOp13CachedOpStateC2ERKNS_7ContextERKN4nnvm5GraphES8_b[_ZN5mxnet8CachedOp13CachedOpStateC5ERKNS_7ContextERKN4nnvm5GraphES8_b]+0x7c5): undefined reference to `mxnet::exec::FusePointwiseBackward(nnvm::Graph&&)'
collect2: error: ld returned 1 exit status
[587/587] Linking CXX shared library libmxnet.so
ninja: build stopped: subcommand failed.

To Reproduce

Follow exactly the steps on https://mxnet.apache.org/get_started/ubuntu_setup.html

Steps to reproduce

git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet
cd mxnet
cp config/config.cmake config.cmake

rm -rf build
mkdir -p build && cd build
cmake -GNinja -C ../config.cmake ..
cmake --build . --parallel 8

Environment

----------Python Info----------
Version      : 3.6.6
Compiler     : GCC 7.2.0
Build        : ('default', 'Jun 28 2018 17:14:51')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ubuntu/src/mxnet/python/mxnet
Num GPUs     : 8
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1100-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-20-50
release      : 4.4.0-1100-aws
version      : #111-Ubuntu SMP Wed Dec 4 12:20:15 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1267.785
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.11
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
@apeforest apeforest added Bug CMake CMake related bugs/issues/improvements Doc labels Feb 7, 2020
@leezu
Copy link
Contributor

leezu commented Feb 7, 2020

b1e4911 is the first bad commit
commit b1e4911
Author: Anirudh Subramanian [email protected]
Date: Sat Feb 1 09:36:59 2020 -0800

@leezu leezu assigned anirudh2290 and unassigned leezu Feb 7, 2020
@leezu leezu changed the title CMake build instruction on ubuntu not working CMake GPU build broken by #16654 Feb 7, 2020
@leezu
Copy link
Contributor

leezu commented Feb 8, 2020

Root cause seems to be the change in preprocessor logic. Compare

b1e4911#diff-ca9cfe7afd877e3f6a1601e9f9894ea1R233

b1e4911#diff-b7c9df82199fd093bf767c1635607088L170

Another thing exposed by this PR is that the config.cmake should include set(ENABLE_CUDA_RTC ON instead of OFF. It's set off by default based on a misunderstanding on my side that this feature is not commonly used anymore.

@anirudh2290
Copy link
Member

Thanks @leezu . Also, need another test on the CI to build with ENABLE_CUDA_RTC OFF.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug CMake CMake related bugs/issues/improvements Doc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants