Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash under GDALGetProjectionRef #2744

Closed
Algunenano opened this issue Jul 3, 2020 · 8 comments
Closed

Crash under GDALGetProjectionRef #2744

Algunenano opened this issue Jul 3, 2020 · 8 comments
Milestone

Comments

@Algunenano
Copy link
Contributor

Expected behavior and actual behavior.

While preparing newer Postgis test images I'm seeing a crash in its regress tests under in a call to GDALGetProjectionRef.

You can see some Travis logs under: https://travis-ci.org/github/Algunenano/postgis/builds/702420642

It crashes both with:

  • Gdal release/3.1 branch, PROJ 7.1 branch. Example build with backtrace at the end.
  • Gdal master, PROJ master. Example build with backtrace at the end.

Extract from the callstack:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: postgres postgis_reg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  pj_obj_create (ctx=0x0, objIn=...) at iso19111/c_api.cpp:203
203	iso19111/c_api.cpp: No such file or directory.
Thread 1 (Thread 0x7f18956da740 (LWP 14548)):
#0  pj_obj_create (ctx=0x0, objIn=...) at iso19111/c_api.cpp:203
        coordop = <optimized out>
        __FUNCTION__ = "pj_obj_create"
        pj = 0x55a6b1dbd710
#1  0x00007f188b212941 in proj_create_ellipsoidal_2D_cs (ctx=0x0, type=type@entry=PJ_ELLPS2D_LATITUDE_LONGITUDE, unit_name=unit_name@entry=0x55a6b1ddb250 "degree", unit_conv_factor=unit_conv_factor@entry=0.017453292519943299) at /usr/include/c++/9/bits/shared_ptr_base.h:756
        __FUNCTION__ = "proj_create_ellipsoidal_2D_cs"
#2  0x00007f188b8f7f4f in OGRSpatialReference::SetGeogCS (this=this@entry=0x7ffc31358390, pszGeogName=0x55a6b1e05e90 "NAD83", pszDatumName=0x55a6b1ed7d70 "North_American_Datum_1983", pszSpheroidName=0x55a6b1d8a230 "GRS 1980", dfSemiMajor=dfSemiMajor@entry=6378137, dfInvFlattening=298.25722210100417, pszPMName=0x55a6b1d89310 "Greenwich", dfPMOffset=0, pszAngularUnits=0x55a6b1ddb250 "degree", dfConvertToRadians=0.017453292519943299) at ogrspatialreference.cpp:3016
        cs = <optimized out>
        obj = <optimized out>
#3  0x00007f188bb69d05 in GTIFGetOGISDefnAsOSR (hGTIF=hGTIF@entry=0x55a6b1db93b0, psDefn=psDefn@entry=0x55a6b1dd2a10) at gt_wkt_srs.cpp:781
        oSRS = {_vptr.OGRSpatialReference = 0x7f188c6d6698 <vtable for OGRSpatialReference+16>, d = std::unique_ptr<OGRSpatialReference::Private> = {get() = 0x55a6b1dc9410}}
        projContext = 0x55a6b20f9130
        pszLinearUnits = <optimized out>
        bLinearUnitsMarkedCorrect = 0
        linearUnitIsSet = 0
        verticalCSType = 0
        verticalDatum = 0
        verticalUnits = 0
        pszGeogName = 0x55a6b1e05e90 "NAD83"
        pszDatumName = 0x55a6b1ed7d70 "North_American_Datum_1983"
        pszPMName = 0x55a6b1d89310 "Greenwich"
        pszSpheroidName = 0x55a6b1d8a230 "GRS 1980"
        pszAngularUnits = 0x55a6b1ddb250 "degree"
        szGCSName = '\000' <repeats 511 times>
        dfSemiMajor = 6378137
        dfInvFlattening = 298.25722210100417
        bGeog3DCRS = <optimized out>
        bSetDatumEllipsoid = <optimized out>
        tmp = 12597
        bGotFromEPSG = <optimized out>
        bNeedManualVertCS = <optimized out>
        citation = "`\214\065\061\374\177\000\000&\302u\225\030\177\000\000\000\000\000\000\000\000\000\000\320\210\065\061\374\177\000\000\001\200\255\373\000\000\000\000`\214\065\061\374\177\000\000`\214\065\061\374\177\000\000`\214\065\061\374\177\000\000`\214\065\061\374\177\000\000b\214\065\061\374\177\000\000'\215\065\061\374\177\000\000`\214\065\061\374\177\000\000'\215\065\061\374\177", '\000' <repeats 42 times>, "p\210\065\061\000\000\000\000h\345\256\260\246U\000\000\000\000\323\261\246U\000\000\000\000\000\000\000\000\000\000 \224\065\061\374\177\000\000\340\210\065\061\374\177\000\000p\273\330\261\246U\000\000\000"...
#4  0x00007f188bb13a23 in GTiffDataset::LookForProjection (this=0x55a6b1df76f0) at geotiff.cpp:12758
        hSRS = <optimized out>
        psGTIFDefn = 0x55a6b1dd2a10
        hGTIF = 0x55a6b1db93b0
        hGTIF = <optimized out>
        psGTIFDefn = <optimized out>
        hSRS = <optimized out>
        pszVertUnit = <optimized out>
        versions = <optimized out>
        pszDefaultReportCompdCS = <optimized out>
#5  GTiffDataset::LookForProjection (this=0x55a6b1df76f0) at geotiff.cpp:12728
        hGTIF = <optimized out>
        psGTIFDefn = <optimized out>
        hSRS = <optimized out>
        pszVertUnit = <optimized out>
        versions = <optimized out>
        pszDefaultReportCompdCS = <optimized out>
#6  0x00007f188bb1e493 in GTiffDataset::GetSpatialRef (this=0x55a6b1df76f0) at geotiff.cpp:18552
No locals.
#7  GTiffDataset::GetSpatialRef (this=0x55a6b1df76f0) at geotiff.cpp:18546
No locals.
#8  0x00007f188be221ab in GDALDataset::GetProjectionRef (this=0x55a6b1df76f0) at gdaldataset.cpp:851
No locals.
#9  0x00007f188be222fa in GDALGetProjectionRef (hDS=<optimized out>) at gdal_priv.h:634
No locals.
#10 0x00007f188c7898da in rt_util_gdal_sr_auth_info (hds=hds@entry=0x55a6b1df76f0, authname=authname@entry=0x7ffc313592b0, authcode=authcode@entry=0x7ffc313592b8) at rt_util.c:281
        srs = 0x0

Last known good master build (not super recent I know): GDAL: GDAL 3.2.0dev-60a090a-dirty, released 2020/05/08

Current master build, which is crashing: GDAL: GDAL 3.2.0dev-69b0c4e-dirty, released 2020/06/25

I've managed to reproduce the issue locally so I'm trying to bisect the issue to pinpoint the specific commit, but it might take some time (there are ~500 commits between the 2 of them, and building gdal is not particularly fast :D). Any pointers as to what to build or test might save me tons of time.

Steps to reproduce the problem.

Under Postgis build tree already built ./raster/test/regress run:

perl ../../../regress/run_test.pl --raster rt_fromgdalraster

Operating system

Linux. The travis build are using debian:unstable-slim and my local version is running under ArchLinux.

GDAL version and provenance

Both 3.1 and master seem to be affected.
Not seeing the issue with GDAL 3.0 + PROJ master (7.1.0.r13.g42b9c119a) or PROJ 6.3.1 (tag).

@Algunenano
Copy link
Contributor Author

Algunenano commented Jul 3, 2020

I've reduced the range to:

  • v3.1.0RC1.r513.g750c76e4d0: OK
  • v3.1.0RC1.r514.g1e29a15259: OK
  • v3.1.0RC1.r515.ge873aa230b: KO

So it seems that the issue was introduced by e873aa2 to fix #2691. It was backported to release/3.1 as dfe2a31

I'm building master + revert of e873aa2 to confirm if that fixes it.

@Algunenano
Copy link
Contributor Author

Algunenano commented Jul 3, 2020

I can confirm that current HEAD + revert of e873aa2 fixes the crash.

In my local environment the backtrace is not exactly the same, but it looks pretty similar. Here is it in case it helps:

#0  pj_obj_create (ctx=0x0, objIn=...) at iso19111/c_api.cpp:203
        coordop = <optimized out>
        __FUNCTION__ = "pj_obj_create"
        pj = 0x5614c1a0d6d0
#1  0x00007f0ef3182f9d in proj_create_ellipsoidal_2D_cs (ctx=0x0, type=<optimized out>, unit_name=<optimized out>, unit_conv_factor=<optimized out>) at /usr/include/c++/10.1.0/bits/shared_ptr_base.h:1198
        __FUNCTION__ = "proj_create_ellipsoidal_2D_cs"
#2  0x00007f0ef390e1c8 in std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_algobase.h:560
        _Num = -731718
        _Num = <optimized out>
#3  std::__copy_move_a2<false, int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_algobase.h:472
No locals.
#4  std::__copy_move_a1<false, int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_algobase.h:506
No locals.
#5  std::__copy_move_a<false, int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_algobase.h:513
No locals.
#6  std::copy<int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_algobase.h:569
No locals.
#7  std::__uninitialized_copy<true>::__uninit_copy<int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_uninitialized.h:109
No locals.
#8  std::uninitialized_copy<int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_uninitialized.h:150
        __assignable = true
        __assignable = <optimized out>
#9  std::__uninitialized_copy_a<int*, int*, int> (__result=<optimized out>, __last=<optimized out>, __first=0x5614c1cd82c0) at /usr/include/c++/10.1.0/bits/stl_uninitialized.h:325
No locals.
#10 std::vector<int, std::allocator<int> >::operator= (__x=..., this=<optimized out>) at /usr/include/c++/10.1.0/bits/vector.tcc:245
        __xlen = 0
#11 OGRSpatialReference::Clone (this=0x5614c1acb3d0) at ogrspatialreference.cpp:1233
        poNewRef = 0x7ffde0722660
#12 0x00007f0ef3be0981 in GTIFGetOGISDefnAsOSR (hGTIF=0x5614c1a21290, psDefn=0x5614c19af100) at gt_wkt_srs.cpp:707
        oSRS = {_vptr.OGRSpatialReference = 0x7f0ef4ad4320 <vtable for OGRSpatialReference+16>, d = std::unique_ptr<OGRSpatialReference::Private> = {get() = 0x5614c1a07650}}
        pszLinearUnits = <optimized out>
        bLinearUnitsMarkedCorrect = 0
        linearUnitIsSet = 0
        verticalCSType = 0
        verticalDatum = 0
        verticalUnits = 0
        pszGeogName = 0x5614c1d1f1d0 "NAD83"
        pszDatumName = 0x5614c1d1b130 "North_American_Datum_1983"
        pszPMName = 0x5614c1d26ae0 "Greenwich"
        pszSpheroidName = 0x5614c1d18380 "GRS 1980"
        pszAngularUnits = 0x5614c1acb3d0 "degree"
        szGCSName = '\000' <repeats 511 times>
        dfSemiMajor = 0
        dfInvFlattening = 298.25722210100417
        bGeog3DCRS = <optimized out>
        bSetDatumEllipsoid = <optimized out>
        tmp = 65535
        bGotFromEPSG = <optimized out>
        bNeedManualVertCS = <optimized out>
        citation = "\347\060r\340\375\177\000\000\000I5\201\265\004q\251 0r\340\375\177\000\000\002\000\000\000\000\000\000\000\307\000\000\000\000\000\000\000p+r\340\375\177\000\000 .r\340\375\177\000\000\360,r\340\375\177\000\000 0r\340\375\177\000\000zj\376\004\021\177\000\000\320,r\340\375\177\000\000`,r\340\375\177\000\000\001\200\255\373\024V\000\000 0r\340\375\177\000\000 0r\340\375\177\000\000 0r\340\375\177\000\000 0r\340\375\177\000\000\"0r\340\375\177\000\000\347\060r\340\375\177\000\000 0r\340\375\177\000\000\347\060r\340\375\177", '\000' <repeats 42 times>...
#13 0x00007f0ef3b7f50c in GTiffDataset::IdentifyAuthorizedGeoreferencingSources (this=0x5614c1d1b1c0) at geotiff.cpp:14585
        osGeorefSources = {<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >> = "", <No data fields>}
        papszTokens = 0x5614c1a21290
#14 0x00007f0ef3b89b2d in GTiffRasterBand::SetNoDataValue (this=0x7ffde0723680, dfNoData=<optimized out>) at geotiff.cpp:5866
No locals.
#15 0x00007f0ef3fd9bbe in GDALGetRasterXSize (hDataset=<optimized out>) at gdaldataset.cpp:698
No locals.
#16 0x00007f0ef4bd9a2b in rt_util_gdal_sr_auth_info (hds=0x0, authname=0x7ffde0723678, authcode=0x7ffde0723670) at rt_util.c:281
--Type <RET> for more, q to quit, c to continue without paging--
        srs = 0x0
#17 0x00007f0ef4bef875 in rt_raster_from_gdal_dataset (ds=0x5614c1d1b1c0) at rt_raster.c:2232
        gt = {-168, 0.083000000000000004, 0, 85, 0, -0.083000000000000004}
        rast = 0x5614c1d1ce90
        authname = 0x0
        i = 0
        numBands = 0
        height = <optimized out>
        width = <optimized out>
        authcode = 0x0
        hasnodata = 0
        ptlen = 0
        pt = PT_END
        gdpixtype = GDT_Unknown
        gdband = 0x0
        ptr = 0x0
        valueslen = 0
        values = 0x0
        cplerr = <optimized out>
        nodataval = <optimized out>
        idx = <optimized out>
        band = <optimized out>
        nXBlockSize = <optimized out>
        nYBlockSize = <optimized out>
        nXBlocks = <optimized out>
        nYBlocks = <optimized out>
        iYBlock = <optimized out>
        nXValid = <optimized out>
        y = <optimized out>
        x = <optimized out>
        nYValid = <optimized out>
        iXBlock = <optimized out>
        iY = <optimized out>

@rouault
Copy link
Member

rouault commented Jul 4, 2020

  • Can you reproduce this with gdalinfo on a GeoTIFF file ?
  • What is your compiler & operating system version ? Wondering if there is not an issue with TLS support in the crashing environments ? From the stack trace, it looks like the creation of the PROJ context normally done in OSRPJContextHolder::init() through OSRPJContextHolder::OSRPJContextHolder() constructor is not called.

Actually PROJ would not be supposed to crash, as a NULL context should normally be interpreted as the default context, so there's an issue in it, but here it is good to see that it spots the NULL context, because using the NULL context is not thread-safe, so the issue on GDAL side must be better understood & adressed.

rouault added a commit to rouault/PROJ that referenced this issue Jul 4, 2020
Found when investigating OSGeo/gdal#2744
but the root cause of the GDAL issue is different.
@Algunenano
Copy link
Contributor Author

Algunenano commented Jul 4, 2020

Can you reproduce this with gdalinfo on a GeoTIFF file ?

Aparently not. I've extracted the tiff from the test file, (attached as a compressed img.zip) and gdalinfo is ok with it:

Driver: GTiff/GeoTIFF
Files: img.tiff
Size is 10, 10
Coordinate System is:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101004,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
Origin = (-168.000000000000000,85.000000000000000)
Pixel Size = (0.083000000000000,-0.083000000000000)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DATETIME=2012:03:02 09:59:31
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_SOFTWARE=Adobe Photoshop CS Windows
  TIFFTAG_XRESOLUTION=96
  TIFFTAG_YRESOLUTION=96
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-168.0000000,  85.0000000) (168d 0' 0.00"W, 85d 0' 0.00"N)
Lower Left  (-168.0000000,  84.1700000) (168d 0' 0.00"W, 84d10'12.00"N)
Upper Right (-167.1700000,  85.0000000) (167d10'12.00"W, 85d 0' 0.00"N)
Lower Right (-167.1700000,  84.1700000) (167d10'12.00"W, 84d10'12.00"N)
Center      (-167.5850000,  84.5850000) (167d35' 6.00"W, 84d35' 6.00"N)
Band 1 Block=10x10 Type=Byte, ColorInterp=Red
Band 2 Block=10x10 Type=Byte, ColorInterp=Green
Band 3 Block=10x10 Type=Byte, ColorInterp=Blue

What is your compiler & operating system version ?

I've only tested 2 environments different environments, both Linux and both break:

  • Archlinux with GCC 10.1
  • Debian Unstable with GCC 9.3.1.

@rouault
Copy link
Member

rouault commented Jul 4, 2020

and do you manage to reproduce with command line raster2pgsql ?

@Algunenano
Copy link
Contributor Author

and do you manage to reproduce with command line raster2pgsql ?

That seems fine:

$ ./raster/loader/raster2pgsql ~/issues/raster_crash/img.tiff 
Processing 1/1: /home/raul/issues/raster_crash/img.tiff
BEGIN;
CREATE TABLE "img" ("rid" serial PRIMARY KEY,"rast" raster);
INSERT INTO "img" ("rastraster);
END;

This is the standalone SQL query that reproduces the issue:

SELECT ST_FromGDALRaster(E'\\x49492a0008000000150000010300010000000a00000001010300010000000a00000002010300030000001a01000003010300010000000100000006010300010000000200000011010400010000001502000015010300010000000300000016010300010000000a00000017010400010000002c0100001a010500010000000a0100001b01050001000000120100001c0103000100000001000000280103000100000002000000310102001b0000003a0100003201020014000000260100005301030003000000200100000e830c00030000005601000082840c00060000006e010000af870300240000009e010000b0870c0005000000e6010000b1870200070000000e0200000000000060000000010000006000000001000000080008000800010001000100323031323a30333a30322030393a35393a33310041646f62652050686f746f73686f702043532057696e646f77730000736891ed7c3fb53f736891ed7c3fb53f000000000000000000000000000000000000000000000000000000000000000000000000000065c000000000004055400000000000000000010001000000080000040000010002000104000001000100000800000100ad100108b187060000000608000001008e230908b087010001000b08b087010000000e08b08703000200a8f9eb941da4724000000040a65458410000000000000000000000000000000000000000000000004e414438337cbytea) AS rast;

rouault added a commit to rouault/gdal that referenced this issue Jul 4, 2020
Fix a bug affecting PostGIS raster (OSGeo#2744), and much likely GMT
(https://lists.osgeo.org/pipermail/gdal-dev/2020-July/052381.html)

If OSRCleanup() was called, deinit() was called on the
current OSRPJContextHolder object, but due to OSGeo@e873aa2
that removed init(), the PROJ context was let at nullptr if
OSRGetProjTLSContext() was called afterwards. This could result
in a crash in PROJ due to a bug into it, for example if OGRSpatialReference::Clone()
was called , or more subtely to potential thread issues if OSRCleanup()
would be called in several threads using OSR API (due to the default nullptr
PROJ context being used concurrently)
@rouault
Copy link
Member

rouault commented Jul 4, 2020

Reproduced, understood and fix submitted in #2746

rouault added a commit to OSGeo/PROJ that referenced this issue Jul 4, 2020
Found when investigating OSGeo/gdal#2744
but the root cause of the GDAL issue is different.
@rouault rouault closed this as completed in 4cc60c5 Jul 4, 2020
@rouault rouault added this to the 3.1.2 milestone Jul 4, 2020
rouault added a commit that referenced this issue Jul 4, 2020
Fix a bug affecting PostGIS raster (#2744), and much likely GMT
(https://lists.osgeo.org/pipermail/gdal-dev/2020-July/052381.html)

If OSRCleanup() was called, deinit() was called on the
current OSRPJContextHolder object, but due to e873aa2
that removed init(), the PROJ context was let at nullptr if
OSRGetProjTLSContext() was called afterwards. This could result
in a crash in PROJ due to a bug into it, for example if OGRSpatialReference::Clone()
was called , or more subtely to potential thread issues if OSRCleanup()
would be called in several threads using OSR API (due to the default nullptr
PROJ context being used concurrently)
@Algunenano
Copy link
Contributor Author

The CI is happy now. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants