Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python seg fault with GDAL>=3.6.0 after opening many rasters in threads #8497

Closed
emlys opened this issue Sep 29, 2023 · 6 comments
Closed

python seg fault with GDAL>=3.6.0 after opening many rasters in threads #8497

emlys opened this issue Sep 29, 2023 · 6 comments
Assignees

Comments

@emlys
Copy link
Contributor

emlys commented Sep 29, 2023

Expected behavior and actual behavior.

I expected the following python script to run successfully, but it exits with a seg fault.

Steps to reproduce the problem.

This error was happening in our test suite when I tried to upgrade from GDAL 3.5 (Github Actions run here). I narrowed it down to this weird, minimal example. Even the unused numpy import is necessary to reproduce:

# test.py
import threading
import subprocess

import numpy
import requests
from osgeo import gdal

gdal.GetDriverByName('GTiff').Create('test.tif', 1, 1)

for _ in range(164):  # the seg fault does not happen with fewer than 164 threads
    thread = threading.Thread(
        target=gdal.OpenEx,
        args=('test.tif',))
    thread.start()
    thread.join()

requests.get('https://example.com')
subprocess.run(['pwd'])
$ python -X dev test.py
/Users/emily/mambaforge/envs/test/lib/python3.11/site-packages/osgeo/gdal.py:287: FutureWarning: Neither gdal.UseExceptions() nor gdal.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
  warnings.warn(
Fatal Python error: Segmentation fault

Current thread 0x00007ff855fa7680 (most recent call first):
  File "/Users/emily/mambaforge/envs/test/lib/python3.11/subprocess.py", line 1883 in _execute_child
  File "/Users/emily/mambaforge/envs/test/lib/python3.11/subprocess.py", line 1026 in __init__
  File "/Users/emily/mambaforge/envs/test/lib/python3.11/subprocess.py", line 548 in run
  File "/Users/emily/invest/test.py", line 18 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _brotli, osgeo._gdal, osgeo._gdalconst, osgeo._ogr, osgeo._osr (total: 18)
Segmentation fault: 11

Operating system

macOS Ventura 13.2.1

GDAL version and provenance

GDAL>=3.6.0 from conda-forge

@rouault
Copy link
Member

rouault commented Sep 29, 2023

I don't manage to reproduce with a GDAL master and 3.7.x builds on a Linux machine with 12 vCPUs.
Neither with gdal 3.7.0 py311h6122507_0 + numpy 1.24.2 py311h8e6699e_0 from conda-forge

@emlys
Copy link
Contributor Author

emlys commented Sep 29, 2023

Here I put the minimal example on github actions, it appears to be specific to macOS:
https://github.com/emlys/demo-gdal-issue/actions/runs/6358178953/job/17270179439

gdal 3.7.2 py311hc436b80_4 and numpy 1.26.0 py311hc44ba51_0 from conda-forge

@rouault
Copy link
Member

rouault commented Sep 30, 2023

I don't have a Mac, so I'm unable to tackle this directly. @emlys if you have a Mac, do you reproduce that with a manual build of GDAL master ? If so, and if the issue doesn't reproduce with GDAL 3.5, you could attempt a "git bisect" session to try to spot the offending commit? (although I've absolutely no clue why importing numpy would matter at all for your test scenario)

@emlys
Copy link
Contributor Author

emlys commented Oct 3, 2023

@rouault I was able to build locally on my Mac (Ventura 13.2.1) but getting this error on import:

>>> from osgeo import gdal
Traceback (most recent call last):
  File "/Users/emily/mambaforge/envs/testgdal/lib/python3.12/site-packages/osgeo/__init__.py", line 30, in swig_import_helper
    return importlib.import_module(mname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/emily/mambaforge/envs/testgdal/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1381, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 915, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 813, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1288, in create_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
ImportError: dlopen(/Users/emily/mambaforge/envs/testgdal/lib/python3.12/site-packages/osgeo/_gdal.cpython-312-darwin.so, 0x0002): Library not loaded: @rpath/libgdal.33.dylib
  Referenced from: <1687DA78-C86A-3410-9CBD-D864DEAC8D2E> /Users/emily/gdal/build/swig/python/osgeo/_gdal.cpython-312-darwin.so
  Reason: tried: '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/bin/../lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/bin/../lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS@rpath/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/emily/mambaforge/envs/testgdal/lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/bin/../lib/libgdal.33.dylib' (no such file), '/Users/emily/mambaforge/envs/testgdal/bin/../lib/libgdal.33.dylib' (no such file), '/usr/local/lib/libgdal.33.dylib' (no such file), '/usr/lib/libgdal.33.dylib' (no such file, not in dyld cache)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/emily/mambaforge/envs/testgdal/lib/python3.12/site-packages/osgeo/__init__.py", line 35, in <module>
    _gdal = swig_import_helper()
            ^^^^^^^^^^^^^^^^^^^^
  File "/Users/emily/mambaforge/envs/testgdal/lib/python3.12/site-packages/osgeo/__init__.py", line 32, in swig_import_helper
    return importlib.import_module('_gdal')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/emily/mambaforge/envs/testgdal/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_gdal'

I noticed this line:

# FIXME: remove -DBUILD_PYTHON_BINDINGS=OFF. Python tests fail with "ModuleNotFoundError: No module named '_gdal'" with macos-12

Is it possibly the same issue and if so, do you know of any workaround to build GDAL on mac 12+ so that I can try the git bisect?

@rouault
Copy link
Member

rouault commented Oct 4, 2023

Is it possibly the same issue

not sure. I'm a bit confused by "/Users/emily/mambaforge/envs/testgdal/lib/python3.12/site-packages/osgeo/init.py" which I'd assume to be the python bindings from conda-forge, not your own build. You should probably uninstall gdal from conda-forge to avoid any unintended mix.
You can source ../scripts/setdevenv.sh (under a Bash shell) from the build directory (assuming it is a subdir of the source dir) to set environment variables GDAL_DATA, PATH, LD_LIBRARY_PATH/DYLD_LIBRARY_PATH and PYTHONPATH to point to the paths of your build.

@emlys
Copy link
Contributor Author

emlys commented Oct 6, 2023

thanks, I was missing the environment variables. I ran git bisect and it identified the bad commit as 4c32466
Here is my process:

# gdal_bisect_script.sh
rm -r build/*
cd build
# changed these options to get around some build errors
cmake -DSPATIALITE_VERSION_STRING=5.1.0 -DGDAL_ENABLE_DRIVER_HDF5=OFF -DOGR_ENABLE_DRIVER_FLATGEOBUF=OFF -DOGR_ENABLE_DRIVER_SQLITE=OFF ..
cmake --build .
. ../scripts/setdevenv.sh
python -c "from osgeo import gdal; print(gdal.__version__)"

trap "exit 1" ERR # convert exit code from 139 to 1, for git bisect run
python -X dev ~/test.py
git clone https://github.com/OSGeo/gdal.git
cd gdal
mkdir build
conda create -n gdal_env --yes numpy requests
conda activate gdal_env
git bisect start
git bisect good v3.5.0
git bisect bad v3.6.0
git bisect run ../gdal_bisect_script.sh

the issue happens with a build of master as well.

@rouault rouault self-assigned this Oct 6, 2023
rouault added a commit that referenced this issue Oct 19, 2023
ogr_proj_p.cpp: disable pthread_atfork() optimization on MacOS (fixes #8497)
rouault added a commit that referenced this issue Oct 20, 2023
[Backport release/3.7] ogr_proj_p.cpp: disable pthread_atfork() optimization on MacOS (fixes #8497)
rouault added a commit to rouault/gdal that referenced this issue Dec 4, 2023
fixes OSGeo#8497)"

This reverts commit 84717b5.

This is no longer needed since commit OSGeo@5238ac8
cf comment OSGeo#8909 (comment)
rouault added a commit that referenced this issue Dec 6, 2023
fixes #8497)"

This reverts commit 84717b5.

This is no longer needed since commit 5238ac8
cf comment #8909 (comment)
ralphraul pushed a commit to 1SpatialGroupLtd/gdal that referenced this issue Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants