Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Segmentation fault on import cudf without CUDA #11941

Closed
weiji14 opened this issue Oct 18, 2022 · 7 comments
Closed

[BUG] Segmentation fault on import cudf without CUDA #11941

weiji14 opened this issue Oct 18, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@weiji14
Copy link

weiji14 commented Oct 18, 2022

Describe the bug

Related to #11366. On a machine without NVIDIA GPUs, import cudf results in a RuntimeError followed by a segmentation fault. Context is that I have some code intended to be cross-compatible between CPU and GPU which used to work with cudf=21.10, but recent versions (e.g. cudf=22.10) results in an unrecoverable segfault.

try:
    import cudf as xpd
except ImportError:
    import pandas as xpd

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

On a computer without an NVIDIA GPU, run these installation steps:

mamba create --name cudfenv -c rapidsai cudf=22.10.00 python=3.8
mamba activate cudfenv
python

Then in a Python console, run:

import cudf

results in

Traceback (most recent call last):
  File "cuda/_cuda/ccuda.pyx", line 3671, in cuda._cuda.ccuda._cuInit
  File "cuda/_cuda/ccuda.pyx", line 435, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInitGlobal'
Traceback (most recent call last):
  File "cuda/_cuda/ccuda.pyx", line 3671, in cuda._cuda.ccuda._cuInit
  File "cuda/_cuda/ccuda.pyx", line 435, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Segmentation fault (core dumped)

Expected behavior
A clear and concise description of what you expected to happen.

Ideally, import cudf would just result in an ImportError (or some other error) without a segmentation fault.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)] Cloud (AWS), but same on local laptop
  • Method of cuDF install: conda

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Click here to see environment details
 **git***
 Not inside a git repository
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=22.04
 DISTRIB_CODENAME=jammy
 DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"
 PRETTY_NAME="Ubuntu 22.04.1 LTS"
 NAME="Ubuntu"
 VERSION_ID="22.04"
 VERSION="22.04.1 LTS (Jammy Jellyfish)"
 VERSION_CODENAME=jammy
 ID=ubuntu
 ID_LIKE=debian
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 UBUNTU_CODENAME=jammy
 Linux jupyter-weiji14 4.14.177-139.253.amzn2.x86_64 #1 SMP Wed Apr 29 09:56:20 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***

./deepicedrain/print_env.sh: line 26: nvidia-smi: command not found

 ***CPU***
 Architecture:                    x86_64
 CPU op-mode(s):                  32-bit, 64-bit
 Address sizes:                   46 bits physical, 48 bits virtual
 Byte Order:                      Little Endian
 CPU(s):                          8
 On-line CPU(s) list:             0-7
 Vendor ID:                       GenuineIntel
 Model name:                      Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
 CPU family:                      6
 Model:                           85
 Thread(s) per core:              2
 Core(s) per socket:              4
 Socket(s):                       1
 Stepping:                        7
 BogoMIPS:                        4999.98
 Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
 Hypervisor vendor:               KVM
 Virtualization type:             full
 L1d cache:                       128 KiB (4 instances)
 L1i cache:                       128 KiB (4 instances)
 L2 cache:                        4 MiB (4 instances)
 L3 cache:                        35.8 MiB (1 instance)
 NUMA node(s):                    1
 NUMA node0 CPU(s):               0-7
 Vulnerability Itlb multihit:     KVM: Vulnerable
 Vulnerability L1tf:              Mitigation; PTE Inversion
 Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
 Vulnerability Meltdown:          Mitigation; PTI
 Vulnerability Spec store bypass: Vulnerable
 Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
 Vulnerability Tsx async abort:   Not affected
 
 ***CMake***
 
 ***g++***
 
 ***nvcc***
 
 ***Python***
 /srv/conda/envs/cudfenv/bin/python
 Python 3.8.13
 
 ***Environment Variables***
 PATH                            : /srv/conda/envs/cudfenv/bin:/srv/conda/condabin:/srv/conda/envs/notebook/bin:/srv/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 LD_LIBRARY_PATH                 :
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /srv/conda/envs/cudfenv
 PYTHON_PATH                     :
 
 ***conda packages***
 /srv/conda/condabin/conda
 # packages in environment at /srv/conda/envs/cudfenv:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                       2_gnu    conda-forge
 arrow-cpp                 9.0.0           py38he270906_2_cpu    conda-forge
 aws-c-cal                 0.5.11               h95a6274_0    conda-forge
 aws-c-common              0.6.2                h7f98852_0    conda-forge
 aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
 aws-c-io                  0.10.5               hfb6a706_0    conda-forge
 aws-checksums             0.1.11               ha31a3da_7    conda-forge
 aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.18.1               h7f98852_0    conda-forge
 ca-certificates           2022.9.24            ha878542_0    conda-forge
 cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
 cubinlinker               0.2.0            py38h7144610_1    rapidsai
 cuda-python               11.7.0           py38hfa26641_0    conda-forge
 cudatoolkit               11.7.0              hd8887f6_10    conda-forge
 cudf                      22.10.00        cuda_11_py38_g8ffe375d85_0    rapidsai
 cupy                      11.2.0           py38h405e1b6_0    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 fastavro                  1.6.1            py38h0a891b7_0    conda-forge
 fastrlock                 0.8              py38hfa26641_2    conda-forge
 fsspec                    2022.8.2           pyhd8ed1ab_0    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 glog                      0.6.0                h6f12383_0    conda-forge
 grpc-cpp                  1.47.1               hbad87ad_6    conda-forge
 importlib-metadata        4.11.4           py38h578d9bd_0    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 krb5                      1.19.3               h3790be6_0    conda-forge
 ld_impl_linux-64          2.39                 hc81fddc_0    conda-forge
 libabseil                 20220623.0      cxx17_h48a1fff_4    conda-forge
 libblas                   3.9.0           16_linux64_openblas    conda-forge
 libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
 libbrotlidec              1.0.9                h166bdaf_7    conda-forge
 libbrotlienc              1.0.9                h166bdaf_7    conda-forge
 libcblas                  3.9.0           16_linux64_openblas    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcudf                   22.10.00        cuda11_g8ffe375d85_0    rapidsai
 libcurl                   7.85.0               h7bff187_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.10               h9b69904_4    conda-forge
 libffi                    3.4.2                h7f98852_5    conda-forge
 libgcc-ng                 12.2.0              h65d4601_18    conda-forge
 libgfortran-ng            12.2.0              h69a702a_18    conda-forge
 libgfortran5              12.2.0              h337968e_18    conda-forge
 libgomp                   12.2.0              h65d4601_18    conda-forge
 libgoogle-cloud           2.1.0                h9ebe8e8_2    conda-forge
 liblapack                 3.9.0           16_linux64_openblas    conda-forge
 libllvm11                 11.1.0               he0ac6c6_4    conda-forge
 libnghttp2                1.47.0               hdcd2b5c_1    conda-forge
 libnsl                    2.0.0                h7f98852_0    conda-forge
 libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
 libprotobuf               3.20.1               h6239696_4    conda-forge
 librmm                    22.10.00        cuda11_g9d5a8c37_0    rapidsai
 libsqlite                 3.39.4               h753d276_0    conda-forge
 libssh2                   1.10.0               haa6b8db_3    conda-forge
 libstdcxx-ng              12.2.0              h46fd767_18    conda-forge
 libthrift                 0.16.0               h491838f_2    conda-forge
 libutf8proc               2.7.0                h7f98852_0    conda-forge
 libuuid                   2.32.1            h7f98852_1000    conda-forge
 libzlib                   1.2.13               h166bdaf_4    conda-forge
 llvmlite                  0.39.1           py38h38d86a4_0    conda-forge
 lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
 ncurses                   6.3                  h27087fc_1    conda-forge
 numba                     0.56.3           py38h9a4aae9_0    conda-forge
 numpy                     1.23.4           py38h7042d01_0    conda-forge
 nvtx                      0.2.3            py38h497a2fe_1    conda-forge
 openssl                   1.1.1q               h166bdaf_0    conda-forge
 orc                       1.7.6                h6c59b99_0    conda-forge
 packaging                 21.3               pyhd8ed1ab_0    conda-forge
 pandas                    1.5.0            py38h8f669ce_0    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 pip                       22.3               pyhd8ed1ab_0    conda-forge
 protobuf                  3.20.1           py38hfa26641_0    conda-forge
 ptxcompiler               0.6.1            py38h7525318_0    conda-forge
 pyarrow                   9.0.0           py38h097c49a_2_cpu    conda-forge
 pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
 python                    3.8.13          h582c2e5_0_cpython    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python_abi                3.8                      2_cp38    conda-forge
 pytz                      2022.4             pyhd8ed1ab_0    conda-forge
 re2                       2022.06.01           h27087fc_0    conda-forge
 readline                  8.1.2                h0f457ee_0    conda-forge
 rmm                       22.10.00        cuda11_py38_g9d5a8c37_0    rapidsai
 s2n                       1.0.10               h9b69904_0    conda-forge
 setuptools                65.5.0             pyhd8ed1ab_0    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.1.9                hbd366e4_1    conda-forge
 spdlog                    1.8.5                h4bd325d_1    conda-forge
 sqlite                    3.39.4               h4ff8645_0    conda-forge
 tk                        8.6.12               h27826a3_0    conda-forge
 typing_extensions         4.4.0              pyha770c72_0    conda-forge
 wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
 xz                        5.2.6                h166bdaf_0    conda-forge
 zipp                      3.9.0              pyhd8ed1ab_0    conda-forge
 zlib                      1.2.13               h166bdaf_4    conda-forge
 zstd                      1.5.2                h6239696_4    conda-forge

Additional context
Add any other context about the problem here.

Xref weiji14/deepicedrain@21e0e99

#11941 (comment)

@shwina
Copy link
Contributor

shwina commented Oct 18, 2022

Yikes! Investigating.

@weiji14
Copy link
Author

weiji14 commented Oct 18, 2022

Thanks @shwina! If it helps, I think the bug was introduced somewhere between 22.04 and 22.06? I tried mamba create --name cudfenv -c rapidsai cudf=22.04.01 python=3.8 and import cudf gave just a TypeError (which is not ideal, but better than a segfault):

>>> import cudf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/cudf/__init__.py", line 5, in <module>
    validate_setup()
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/cudf/utils/gpu_utils.py", line 20, in validate_setup
    from rmm._cuda.gpu import (
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/rmm/__init__.py", line 16, in <module>
    from rmm import mr
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/rmm/mr.py", line 14, in <module>
    from rmm._lib.memory_resource import (
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/rmm/_lib/__init__.py", line 15, in <module>
    from .device_buffer import DeviceBuffer
  File "device_buffer.pyx", line 1, in init rmm._lib.device_buffer
TypeError: C function cuda.ccudart.cudaStreamSynchronize has wrong signature (expected __pyx_t_4cuda_7ccudart_cudaError_t (__pyx_t_4cuda_7ccudart_cudaStream_t), got cudaError_t (cudaStream_t))

whereas cudf=22.06 to cudf=22.10 gave the same RuntimeError + Segmentation fault mentioned in #11941 (comment)

For older versions (cudf=22.02), an ImportError is raised correctly. This is with mamba create --name cudfenv -c rapidsai cudf=22.02 python=3.8:

>>> import cudf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/cudf/__init__.py", line 5, in <module>
    validate_setup()
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/cudf/utils/gpu_utils.py", line 20, in validate_setup
    from rmm._cuda.gpu import (
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/rmm/__init__.py", line 16, in <module>
    from rmm import mr
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/rmm/mr.py", line 14, in <module>
    from rmm._lib.memory_resource import (
  File "/srv/conda/envs/cudfenv/lib/python3.8/site-packages/rmm/_lib/__init__.py", line 15, in <module>
    from .device_buffer import DeviceBuffer
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

@shwina
Copy link
Contributor

shwina commented Oct 18, 2022

Thanks, @weiji14 -- could I ask you to check if force installing cuda-python=11.7.1 resolves the issue for you? If so, we may need to relax our pinning to allow 11.7.1:

 mamba install --no-deps --clobber -c conda-forge cuda-python=11.7.1 

@shwina
Copy link
Contributor

shwina commented Oct 18, 2022

Oh hang on, that will probably break you in other ways. With cuda-python=11.7.1, doing import cudf gives:

TypeError: C function cuda.ccudart.cudaStreamSynchronize has wrong signature (expected __pyx_t_4cuda_7ccudart_cudaError_t (__pyx_t_4cuda_7ccudart_cudaStream_t), got cudaError_t (cudaStream_t))

I'll report back here when I have a proper solution...

@weiji14
Copy link
Author

weiji14 commented Oct 18, 2022

Got this:

Encountered problems while solving:
  - package cudf-22.10.00-cuda_11_py38_g8ffe375d85_0 requires cuda-python >=11.5,<11.7.1, but none of the providers can be installed

So yeah, probably need to do this properly 🙂

@GregoryKimball GregoryKimball added 0 - Waiting on Author Waiting for author to respond to review and removed Needs Triage Need team to review and classify labels Oct 21, 2022
ajschmidt8 pushed a commit to rapidsai/rmm that referenced this issue Oct 26, 2022
This should resolve a segfault we are seeing with `cuda-python=11.7.0` (rapidsai/cudf#11941).

Authors:
   - Ashwin Srinath (https://github.com/shwina)
   - AJ Schmidt (https://github.com/ajschmidt8)
   - Bradley Dice (https://github.com/bdice)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Bradley Dice (https://github.com/bdice)
   - Mark Harris (https://github.com/harrism)
   - AJ Schmidt (https://github.com/ajschmidt8)
raydouglass pushed a commit that referenced this issue Nov 3, 2022
This should resolve a segfault we are seeing with `cuda-python=11.7.0` (#11941).

Authors:
   - Ashwin Srinath (https://github.com/shwina)
   - Bradley Dice (https://github.com/bdice)
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Jordan Jacobelli (https://github.com/Ethyling)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Bradley Dice (https://github.com/bdice)
raydouglass pushed a commit to rapidsai/cugraph that referenced this issue Nov 4, 2022
This should resolve a segfault we are seeing with `cuda-python=11.7.0` (rapidsai/cudf#11941).

Authors:
   - Ashwin Srinath (https://github.com/shwina)
   - Bradley Dice (https://github.com/bdice)
   - Ray Douglass (https://github.com/raydouglass)

Approvers:
   - Bradley Dice (https://github.com/bdice)
   - Brad Rees (https://github.com/BradReesWork)
   - Ray Douglass (https://github.com/raydouglass)
   - GALI PREM SAGAR (https://github.com/galipremsagar)
raydouglass pushed a commit to rapidsai/cuml that referenced this issue Nov 4, 2022
This should resolve a segfault we are seeing with `cuda-python=11.7.0` (rapidsai/cudf#11941).

Authors:
   - Ashwin Srinath (https://github.com/shwina)
   - Bradley Dice (https://github.com/bdice)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Bradley Dice (https://github.com/bdice)
   - Dante Gama Dessavre (https://github.com/dantegd)
   - Ray Douglass (https://github.com/raydouglass)
@galipremsagar galipremsagar removed the 0 - Waiting on Author Waiting for author to respond to review label Nov 7, 2022
@galipremsagar
Copy link
Contributor

@weiji14 We've released 22.10.01 which removes the segfaults.

conda create -n rapids-22.10 -c rapidsai -c conda-forge -c nvidia  \
    cudf=22.10 python=3.9 cudatoolkit=11.5

Let us know if you still have any issues.

@galipremsagar
Copy link
Contributor

Closing this issue as the segfaults are fixed. Please free to re-open if this re-surfaces.

jakirkham pushed a commit to jakirkham/cuml that referenced this issue Feb 27, 2023
This should resolve a segfault we are seeing with `cuda-python=11.7.0` (rapidsai/cudf#11941).

Authors:
   - Ashwin Srinath (https://github.com/shwina)
   - Bradley Dice (https://github.com/bdice)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Bradley Dice (https://github.com/bdice)
   - Dante Gama Dessavre (https://github.com/dantegd)
   - Ray Douglass (https://github.com/raydouglass)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants