Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seg fault during build #308

Open
adamjstewart opened this issue Mar 22, 2020 · 5 comments
Open

Seg fault during build #308

adamjstewart opened this issue Mar 22, 2020 · 5 comments

Comments

@adamjstewart
Copy link

adamjstewart commented Mar 22, 2020

I'm trying to install NCCL 2.5.7-1 on a Cray CNL5 cluster with K20's (yes, I know it's old) using CUDA 9.1.85 and GCC 6.3.0. I'm seeing the following error during build time:

$ make -j16 CUDA_HOME=/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1
...
make[2]: Leaving directory `/mnt/c/scratch/sciteam/stewart1/spack-stage-nccl-2.5.7-1-5qjqktpsti2z4tzljpaemj2wlnosbusx/spack-src/src/collectives/device'
Linking    libnccl.so.2.5.7                    > /mnt/c/scratch/sciteam/stewart1/spack-stage-nccl-2.5.7-1-5qjqktpsti2z4tzljpaemj2wlnosbusx/spack-src/build/lib/libnccl.so.2.5.7
make: *** [src.build] Segmentation fault (core dumped)

How can I go about debugging this? I don't know much about C/CUDA, I'm just trying to install NCCL for use with PyTorch.

@adamjstewart
Copy link
Author

I also tried NCCL 2.4.6-1, 2.4.8-1, and 2.5.6-1 and they all failed at the same stage of the build with the same seg fault.

@adamjstewart
Copy link
Author

I also tried GCC 5.3.0 but with the same result.

@sjeaugey
Copy link
Member

sjeaugey commented Mar 23, 2020

I'm not sure CUDA 9.1 supported such recent compilers. You may take a look at the documentation to see what version is supported depending on the host distribution:

https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html#system-requirements

@adamjstewart
Copy link
Author

Thanks for the suggestion. I'm on a Cray CNL5 cluster, but the underlying distro looks like SLES 11:

$ cat /etc/SuSE-release 
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

The system compiler is pretty ancient:

$ gcc --version
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Unfortunately I need to use GCC 4.8+ to build NumPy, so let me try a newer 4.X release and see if that works.

@adamjstewart
Copy link
Author

GCC 4.9.3 has the same seg fault...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants