Seg fault during build #308

adamjstewart · 2020-03-22T20:38:53Z

I'm trying to install NCCL 2.5.7-1 on a Cray CNL5 cluster with K20's (yes, I know it's old) using CUDA 9.1.85 and GCC 6.3.0. I'm seeing the following error during build time:

$ make -j16 CUDA_HOME=/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1
...
make[2]: Leaving directory `/mnt/c/scratch/sciteam/stewart1/spack-stage-nccl-2.5.7-1-5qjqktpsti2z4tzljpaemj2wlnosbusx/spack-src/src/collectives/device'
Linking    libnccl.so.2.5.7                    > /mnt/c/scratch/sciteam/stewart1/spack-stage-nccl-2.5.7-1-5qjqktpsti2z4tzljpaemj2wlnosbusx/spack-src/build/lib/libnccl.so.2.5.7
make: *** [src.build] Segmentation fault (core dumped)

How can I go about debugging this? I don't know much about C/CUDA, I'm just trying to install NCCL for use with PyTorch.

adamjstewart · 2020-03-22T22:22:50Z

I also tried NCCL 2.4.6-1, 2.4.8-1, and 2.5.6-1 and they all failed at the same stage of the build with the same seg fault.

adamjstewart · 2020-03-23T01:05:19Z

I also tried GCC 5.3.0 but with the same result.

sjeaugey · 2020-03-23T15:52:04Z

I'm not sure CUDA 9.1 supported such recent compilers. You may take a look at the documentation to see what version is supported depending on the host distribution:

https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html#system-requirements

adamjstewart · 2020-03-23T15:59:34Z

Thanks for the suggestion. I'm on a Cray CNL5 cluster, but the underlying distro looks like SLES 11:

$ cat /etc/SuSE-release 
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

The system compiler is pretty ancient:

$ gcc --version
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Unfortunately I need to use GCC 4.8+ to build NumPy, so let me try a newer 4.X release and see if that works.

adamjstewart · 2020-03-23T17:58:33Z

GCC 4.9.3 has the same seg fault...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seg fault during build #308

Seg fault during build #308

adamjstewart commented Mar 22, 2020 •

edited

Loading

adamjstewart commented Mar 22, 2020

adamjstewart commented Mar 23, 2020

sjeaugey commented Mar 23, 2020 •

edited

Loading

adamjstewart commented Mar 23, 2020

adamjstewart commented Mar 23, 2020

Seg fault during build #308

Seg fault during build #308

Comments

adamjstewart commented Mar 22, 2020 • edited Loading

adamjstewart commented Mar 22, 2020

adamjstewart commented Mar 23, 2020

sjeaugey commented Mar 23, 2020 • edited Loading

adamjstewart commented Mar 23, 2020

adamjstewart commented Mar 23, 2020

adamjstewart commented Mar 22, 2020 •

edited

Loading

sjeaugey commented Mar 23, 2020 •

edited

Loading