Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdalbuildvrt: error on non-integer '-sd' #9672

Closed
mdsumner opened this issue Apr 15, 2024 · 11 comments · Fixed by #9683
Closed

gdalbuildvrt: error on non-integer '-sd' #9672

mdsumner opened this issue Apr 15, 2024 · 11 comments · Fixed by #9683
Assignees

Comments

@mdsumner
Copy link
Contributor

mdsumner commented Apr 15, 2024

Feature description

Specifying '-sd' as the name of the subdataset is a mistake (it should be an integer), and quietly produces no output file.

gdalbuildvrt  -sd sst sst.vrt /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc

ls *.vrt
#ls: cannot access '*.vrt': No such file or directory

This is confusing and should raise an error that 'sd' needs be a number (1-based).

It's also problematic because the order of the subdatasets (even within a dataset like this) is not guaranteed, and could have 'sd_name' as more robust input (like 'vrt://').

New features

  • make '-sd' non-integer input an error
  • add 'sd_name' to the utility
  • consider 'sd_n' and 'sd_name_n' to match input-rasters
  • error on general case of no raster found (i.e. no VRT generated)
  • clarify doc for "-sd" as "n" being not name, here and elsewhere, see further notes below
@mdsumner mdsumner changed the title gdalbuildvrt: warn on non-integer '-sd' gdalbuildvrt: error on non-integer '-sd' Apr 15, 2024
@jratike80
Copy link
Collaborator

jratike80 commented Apr 16, 2024

I do not see "name" in the documentation https://gdal.org/programs/gdalbuildvrt.html and by the documentation using full subdataset name is supported.

-sd <subdataset>
If the input dataset contains several subdatasets use a subdataset with the specified number (starting from 1). This is an alternative of giving the full subdataset name as an input.

It would be good to check that the value of -sd is an integer. Related development has already been done #9445

I guess that -sd was added as a convenience option so user (or developer?) can avoid writing a long subdataset name. In case of many inputs, I wonder if the same -sd value is applied to all of them. The whole parameter feels suboptimal for mosaicing subdatasets because user can give it only once, but they might want to mosaic "dataset1-subdataset1" + "dataset2-subdataset2". Should we deny the use of -sd if there are many inputs and require a list of full names instead? And warn users in the documentation that the order of subdatasets may change and using -sd in scripts is unsafe.

@jratike80
Copy link
Collaborator

I know almost nothing about NetCDF, but I would expect that gdalinfo would list the names of the subdatasets if they exist, but these commands do not find subdatasets:

gdalinfo /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

gdalinfo  /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc --debug on

Running your original command prints these messages for me:

Warning 1: gdalbuildvrt does not support ungeoreferenced image. Skipping /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

Warning 1: gdalbuildvrt does not support ungeoreferenced image. Skipping /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc

If all inputs are skipped then no VRT as a result is expected.

@mdsumner
Copy link
Contributor Author

mdsumner commented Apr 16, 2024

oh wait, I missed that ambiguity - "full name is supported", I read that as full name == "DRIVER:{dsn}:SUBDATASET" as input to the utility i.e. '<input_raster> [<input_raster>]...' not as input to the "-sd" arg (I'm only being thorough).

and that's right, "-sd" is relative to input rasters, not explicit (when submitting this I considered the possibility of "-sd_n" individual to each subdataset, but that even more strongly suggests the need for 'sd_name' and maybe 'sd_name_n' for completeness.

gdalbuildvrt  -sd "NETCDF:/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc:sst" out.vrt
## ERROR as expected

@mdsumner
Copy link
Contributor Author

mdsumner commented Apr 16, 2024

I know almost nothing about NetCDF, but I would expect that gdalinfo would list the names of the subdatasets if they exist, but these commands do not find subdatasets:

gdalinfo /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

gdalinfo  /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810902.nc --debug on

then your build must not be suitable for reading such datasets, what output do you get? I still think there was enough information in my original. If 'gdalbuildvrt' doesn't find any input rasters, then a silent outcome to do nothing seems like a bad situation (though as ever maybe I'm missing something).

There's nothing special about NetCDF, subdataset-wise.

@jratike80
Copy link
Collaborator

This is what I get, just one layer, no subdatasets:

gdalinfo /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc

Driver: HDF5/Hierarchical Data Format Release 5
Files: /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc
Size is 512, 512
Metadata:
  anom_add_offset=0
  anom_long_name=Daily sea surface temperature anomalies
  anom_scale_factor=0.0099999998
  anom_units=Celsius
  anom_valid_max=1200
  anom_valid_min=-1200
  anom__FillValue=-999
  anom__Netcdf4Dimid=0
  cdm_data_type=Grid
  comment=Data was converted from NetCDF-3 to NetCDF-4 format with metadata updates in November 2017.
  Conventions=CF-1.6, ACDD-1.3
  [email protected]
  creator_url=https://www.ncei.noaa.gov/
  date_created=2020-05-08T19:05:13Z
  date_modified=2020-05-08T19:05:13Z
  err_add_offset=0
  err_long_name=Estimated error standard deviation of analysed_sst
  err_scale_factor=0.0099999998
  err_units=Celsius
  err_valid_max=1000
  err_valid_min=0
  err__FillValue=-999
  err__Netcdf4Dimid=0
  geospatial_lat_max=90
  geospatial_lat_min=-90
  geospatial_lat_resolution=0.25
  geospatial_lat_units=degrees_north
  geospatial_lon_max=360
  geospatial_lon_min=0
  geospatial_lon_resolution=0.25
  geospatial_lon_units=degrees_east
  history=Final file created using preliminary as first guess, and 3 days of AVHRR data. Preliminary uses only 1 day of AVHRR data.
  ice_add_offset=0
  ice_long_name=Sea ice concentration
  ice_scale_factor=0.0099999998
  ice_units=%
  ice_valid_max=100
  ice_valid_min=0
  ice__FillValue=-999
  ice__Netcdf4Dimid=0
  id=oisst-avhrr-v02r01.19810901.nc
  institution=NOAA/National Centers for Environmental Information
  instrument=Earth Remote Sensing Instruments > Passive Remote Sensing > Spectrometers/Radiometers > Imaging Spectrometers/Radiometers > AVHRR > Advanced Very High Resolution Radiometer
  instrument_vocabulary=Global Change Master Directory (GCMD) Instrument Keywords
  keywords=Earth Science > Oceans > Ocean Temperature > Sea Surface Temperature
  keywords_vocabulary=Global Change Master Directory (GCMD) Earth Science Keywords
  lat_CLASS=DIMENSION_SCALE
  lat_grids=Uniform grid from -89.875 to 89.875 by 0.25
  lat_long_name=Latitude
  lat_NAME=lat
  lat_REFERENCE_LIST=
  lat_units=degrees_north
  lat__Netcdf4Dimid=2
  lon_CLASS=DIMENSION_SCALE
  lon_grids=Uniform grid from 0.125 to 359.875 by 0.25
  lon_long_name=Longitude
  lon_NAME=lon
  lon_REFERENCE_LIST=
  lon_units=degrees_east
  lon__Netcdf4Dimid=3
  metadata_link=https://doi.org/10.25921/RE9P-PT57
  naming_authority=gov.noaa.ncei
  ncei_template_version=NCEI_NetCDF_Grid_Template_v2.0
  platform=Ships, buoys, Argo floats, MetOp-A, MetOp-B
  platform_vocabulary=Global Change Master Directory (GCMD) Platform Keywords
  processing_level=NOAA Level 4
  product_version=Version v02r01
  references=Reynolds, et al.(2007) Daily High-Resolution-Blended Analyses for Sea Surface Temperature (available at https://doi.org/10.1175/2007JCLI1824.1). Banzon, et al.(2016) A long-term record of blended satellite and in situ sea-surface temperature for climate monitoring, modeling and environmental studies (available at https://doi.org/10.5194/essd-8-165-2016). Huang et al. (2020) Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) Version v02r01, submitted.Climatology is based on 1971-2000 OI.v2 SST. Satellite data: Pathfinder AVHRR SST and Navy AVHRR SST. Ice data: NCEP Ice and GSFC Ice.
  sensor=Thermometer, AVHRR
  source=ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfinder_AVHRR, Navy_AVHRR
  sst_add_offset=0
  sst_long_name=Daily sea surface temperature
  sst_scale_factor=0.0099999998
  sst_units=Celsius
  sst_valid_max=4500
  sst_valid_min=-300
  sst__FillValue=-999
  sst__Netcdf4Dimid=0
  standard_name_vocabulary=CF Standard Name Table (v40, 25 January 2017)
  summary=NOAAs 1/4-degree Daily Optimum Interpolation Sea Surface Temperature (OISST) (sometimes referred to as Reynolds SST, which however also refers to earlier products at different resolution), currently available as version v02r01, is created by interpolating and extrapolating SST observations from different sources, resulting in a smoothed complete field. The sources of data are satellite (AVHRR) and in situ platforms (i.e., ships and buoys), and the specific datasets employed may change over time. At the marginal ice zone, sea ice concentrations are used to generate proxy SSTs.  A preliminary version of this file is produced in near-real time (1-day latency), and then replaced with a final version after 2 weeks. Note that this is the AVHRR-ONLY DOISST, available from Oct 1981, but there is a companion DOISST product that includes microwave satellite data, available from June 2002
  time_CLASS=DIMENSION_SCALE
  time_coverage_end=1981-09-01T23:59:59Z
  time_coverage_start=1981-09-01T00:00:00Z
  time_long_name=Center time of the day
  time_NAME=time
  time_REFERENCE_LIST=
  time_units=days since 1978-01-01 12:00:00
  time__Netcdf4Dimid=0
  title=NOAA/NCEI 1/4 Degree Daily Optimum Interpolation Sea Surface Temperature (OISST) Analysis, Version 2.1 - Final
  zlev_actual_range=0, 0
  zlev_CLASS=DIMENSION_SCALE
  zlev_long_name=Sea surface height
  zlev_NAME=zlev
  zlev_positive=down
  zlev_REFERENCE_LIST=
  zlev_units=meters
  zlev__Netcdf4Dimid=1
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

@mdsumner
Copy link
Contributor Author

ah fair enough, I didn't think of that as a situation, on your system you might try

gdalinfo /vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198109/oisst-avhrr-v02r01.19810901.nc -if NetCDF

(I didn't realize that HDF5 didn't present the same way, NetCDF >=4 is a kind of container defined around HDF5 that behaves like NetCDF<4 as needed ...)

@mdsumner
Copy link
Contributor Author

at any rate, happy to be assigned to this as stated above in OP

@jratike80
Copy link
Collaborator

I had to download the dataset because I am on Windows and "Opening a /vsi file with the netCDF driver requires Linux userfaultfd to be available"

From the local file gdalinfo with the NetCDF driver finds

Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"oisst-avhrr-v02r01.19810901.nc":anom
  SUBDATASET_1_DESC=[1x1x720x1440] anom (16-bit integer)
  SUBDATASET_2_NAME=NETCDF:"oisst-avhrr-v02r01.19810901.nc":err
  SUBDATASET_2_DESC=[1x1x720x1440] err (16-bit integer)
  SUBDATASET_3_NAME=NETCDF:"oisst-avhrr-v02r01.19810901.nc":ice
  SUBDATASET_3_DESC=[1x1x720x1440] ice (16-bit integer)
  SUBDATASET_4_NAME=NETCDF:"oisst-avhrr-v02r01.19810901.nc":sst
  SUBDATASET_4_DESC=[1x1x720x1440] sst (16-bit integer)

Now there are two options. Either to use -sd with the subdataset number

gdalbuildvrt -sd 4 out.vrt oisst-avhrr-v02r01.19810901.nc

Or to run the command without -sd by giving the full subdataset name as an input for gdalbuildvrt

gdalbuildvrt out.vrt NETCDF:"oisst-avhrr-v02r01.19810901.nc":sst

@jratike80
Copy link
Collaborator

I can see that gdalinfo defines -sd differently in this part of documentation https://gdal.org/programs/gdalinfo.html#cmdoption-gdalinfo-sd (but not in the Synopsis section).

-sd <n>
If the input dataset contains several subdatasets read and display a subdataset with specified n number (starting from 1). This is an alternative of giving the full subdataset name.

Maybe it would be better to use -sd <n> instead of -sd <subdataset> everywhere. And clarify "This is an alternative of giving the full subdataset name as input to the utility"

@mdsumner
Copy link
Contributor Author

Agreed

rouault added a commit to rouault/gdal that referenced this issue Apr 16, 2024
which makes sure that -sd value is an integer (fixes OSGeo#9672)
@rouault rouault self-assigned this Apr 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Apr 16, 2024
which makes sure that -sd value is an integer (fixes OSGeo#9672)
rouault added a commit to rouault/gdal that referenced this issue Apr 17, 2024
which makes sure that -sd value is an integer (fixes OSGeo#9672)
@mdsumner
Copy link
Contributor Author

Awesome ty🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants