Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GDAL Raster Tile Index (GTI) driver, and associated gdaltindex improvements #8983

Merged
merged 11 commits into from
Jan 23, 2024

Conversation

rouault
Copy link
Member

@rouault rouault commented Dec 20, 2023

Closes #8861

  • Cf https://github.com/rouault/gdal/blob/vrttileindex/doc/source/drivers/raster/vrtti.rst for the whole description of the new driver. Summary is :
    The VRTTI driver is a driver that allows to handle catalogs with a large number of raster files (called "tiles" in the rest of this document, even if a regular tiling is not required by the driver), and build a virtual mosaic from them. Each tile may be in any GDAL supported raster format, and be a file stored on a regular filesystem, or any GDAL supported virtual filesystem (for raster drivers that support such files).
    This driver offers similar functionality as the VRT driver with the following main differences:

    • The tiles are listed as features of any GDAL supported vector format. Use of formats with efficient spatial filtering is recommended, such as GeoPackage, FlatGeoBuf or PostGIS. The VRTTI driver can thus use a larger number of tiles than the VRT driver (hundreds of thousands or more), provided the underlying vector format is efficient.
    • The tiles may have different SRS. The VRTTI driver is capable of on-the-fly reprojection
    • The VRTTI driver offers control on the order in which tiles are composited, when they overlap (z-order)
    • The VRTTI driver honours the mask/alpha band when compositing together overlapping tiles.
    • Contrary to the VRT driver, the VRTTI driver does not enable to alter characteristics of referenced tiles, such as their georeferencing, nodata value, etc. If such behavior is desired, the tiles must be for example wrapped individually in a VRT file before being referenced in the VRTTI index.
  • gdaltindex is made a C callable function GDALTileIndex(), and from Python with gdal.TileIndex()

  • Cf https://github.com/rouault/gdal/blob/vrttileindex/doc/source/programs/gdaltindex.rst for the gdaltindex enhancements. New options are -overwrite, -vrtti_filename, -tr, -te, -ot, -bandcount, -nodata, -colorinterp, -mask, -mo, -recursive, -filename_filter, -min_pixel_size, -max_pixel_size, -fetch_md

  • gdaladdo: make --partial-refresh-from-source-timestamp work on VRTTI datasets

@rouault rouault added this to the 3.9.0 milestone Dec 20, 2023
@rouault rouault force-pushed the vrttileindex branch 3 times, most recently from e9b5448 to 198202d Compare December 20, 2023 16:44
@vincentsarago
Copy link
Contributor

@rouault this looks super interesting. I wonder if using tiles won't be confusing? I wonder if assets or items will be better.

@jratike80
Copy link
Collaborator

Geoserver folks use name "granule". For us who use Mapserver and gdaltindex tiles is not confusing at all. But maybe it it is a matter of selecting who to confuse. Items sounds like OGC API and assets sound like STAC, much newer inventions than gdaltindex. But maybe they will last longer. Do you suggest renaming gdaltindex?

@coveralls
Copy link
Collaborator

coveralls commented Dec 20, 2023

Coverage Status

coverage: 68.808% (+0.06%) from 68.752%
when pulling c624f2b on rouault:vrttileindex
into 527b897 on OSGeo:master.

@rouault rouault force-pushed the vrttileindex branch 3 times, most recently from 0b2a51c to 7fcddfe Compare December 21, 2023 00:10
@mdsumner
Copy link
Contributor

this is awesome ... 🚀 (I've had several questions and ideas and found everything I need until this)

could -recursive be made to work for the filelist in a VRT? i.e. this would give two features in the .vrt.gpkg not just one

gdalbuildvrt /tmp/tiles.vrt gcore/data/vrtmisc16_tile1.tif gcore/data/vrtmisc16_tile2.tif

gdaltindex /tmp/tiles.vrt.gpkg /tmp/tiles.vrt -recursive

I don't think that's possible yet, without fleshing out the file list for gdaltindex or modifying the layer location field.

@rouault
Copy link
Member Author

rouault commented Dec 21, 2023

could -recursive be made to work for the filelist in a VRT?

That could be surprising to do that by default. We'd likely need a -consider_vrt_as_file_list or something like that. But even with that, one should not forget that you can do very advanced tricks with the sources in a VRT. You can move things around, like inverting hemispheres, incorporate sources without proper geolocation, have different sources for each band, etc. So an arbitrary VRT cannot be converted to a VRTTI tile index.

@rouault rouault force-pushed the vrttileindex branch 3 times, most recently from 7a8f4e9 to 9ee0034 Compare December 21, 2023 14:19
@mdsumner
Copy link
Contributor

could -recursive be made to work for the filelist in a VRT?

That could be surprising to do that by default. We'd likely need a -consider_vrt_as_file_list or something like that. But even with that, one should not forget that you can do very advanced tricks with the sources in a VRT. You can move things around, like inverting hemispheres, incorporate sources without proper geolocation, have different sources for each band, etc. So an arbitrary VRT cannot be converted to a VRTTI tile index.

ah indeed, thanks - perhaps --optfile could carry the special case with caveats for "only trivial vrt mosaics as a convenience", I'll explore

@rouault
Copy link
Member Author

rouault commented Dec 21, 2023

--optfile could carry the special cas

--optfile is evaluated in a generic part of the code (and only for the binary itself, not the utlility-as-a-function), and just adds the arguments of the files as if they had been put in the regular command line. I do believe a special flag would be required to mean "use GetFileList() on the VRT instead of the VRT itself", since I can imagine people could create tile indices with VRTs that would have special behaviour (scaling, etc), and you don't want to use just the source of those VRTs, but the VRT with its specific behavior

@mdsumner
Copy link
Contributor

the situation I'm thinking of is list of sources where it's not required to open them to get extent (or footprint perhaps), stac and simple vrt mosaics being obvious ones. the opentopo vrts are good examples.

I can't see how other mods to sources could be encapsulated, unless this becomes a feature-table way of expressing VRT generally (which I think is interesting).

@rouault
Copy link
Member Author

rouault commented Jan 8, 2024

Besides @mdsumner has any observer here give it a try ?

@sgillies
Copy link
Contributor

sgillies commented Jan 8, 2024

@rouault at the risk of bike shedding... how about not using "VRT" in the name of this driver so that this new one and the old VRT can be isolated in searches?

@rouault
Copy link
Member Author

rouault commented Jan 8, 2024

how about not using "VRT" in the name of this driver so that this new one and the old VRT can be isolated in searches?

Just TI then ? Or maybe TILEIDX ?

@rcoup
Copy link
Member

rcoup commented Jan 8, 2024

how about not using "VRT" in the name of this driver so that this new one and the old VRT can be isolated in searches?

Just TI then ? Or maybe TILEIDX ?

"TI" won't really be google-able.

Just throwing out ideas...

"VRI" for "Virtual Raster Index" although it's a bit of a pointless namesince it's an actual index, not a virtual index.

"GRI" for "GDAL Raster Index"? Though there's already a .gri file in geospatial. "GTI" for "GDAL Tile Index"?

"ORI" for "Open Raster Index" or "OTI" for "Open Tile Index" could work?

@rcoup
Copy link
Member

rcoup commented Jan 8, 2024

"VRI" for "Virtual Raster Index" although it's a bit of a pointless namesince it's an actual index, not a virtual index.

Though I guess it references another datasource, so maybe a virtual index isn't such a terrible description.

@rouault
Copy link
Member Author

rouault commented Jan 8, 2024

GTI could work for me (although one could argue that it is a MapServer tile index :-))

@rouault rouault changed the title Add Virtual Raster Tile Index (VRTTI) driver, and associated gdaltindex improvements Add GDAL Raster Tile Index (GTI) driver, and associated gdaltindex improvements Jan 9, 2024
@rouault
Copy link
Member Author

rouault commented Jan 9, 2024

Driver renamed to GTI

@sgillies
Copy link
Contributor

sgillies commented Jan 9, 2024

Works for me. Back when I started in open source I used to dream of owning a Golf GTI one day 😆

@mdsumner
Copy link
Contributor

the one thing I still have is there's no way to easily create a very large index without opening every file - which is unusably slow, the way I do it is instantiate the file with gdaltindex from one of the sources, then update that vector layer with the bbox footprint geometry and location source with a vector layer append. I still havent' experimented with mixed crs, so each footprint really could be a dense polygon outline

I do that with R here FWIW (still using the original driver name): https://github.com/mdsumner/cog-example/blob/main/data-raw/opentopo_vrt.R (I could do that with python, or C++, or write more UI to work from a table but it's extra work for anyone wanting to try this out, with a slightly odd mix of vector and raster).

Ideally I'd like to create the vector source and then translate that to raster GTI, so the geoms could be trivial bbox polygons, or actual footprints in a source-foreign crs of the vector layer. (I had hoped to contribute to at least the fast derivation from a VRT file by now but it's not been possible).

Copy link
Collaborator

@elpaso elpaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

I have just left some minor comments.

autotest/utilities/test_gdaltindex_lib.py Show resolved Hide resolved
autotest/utilities/test_gdaltindex_lib.py Show resolved Hide resolved
doc/source/programs/gdalbuildvrt.rst Outdated Show resolved Hide resolved
doc/source/programs/gdaltindex.rst Outdated Show resolved Hide resolved
doc/source/programs/gdaltindex.rst Show resolved Hide resolved
frmts/vrt/gdaltileindexdataset.cpp Show resolved Hide resolved
frmts/vrt/gdaltileindexdataset.cpp Outdated Show resolved Hide resolved
swig/include/python/gdal_python.i Show resolved Hide resolved
swig/include/python/gdal_python.i Outdated Show resolved Hide resolved
swig/include/python/gdal_python.i Show resolved Hide resolved
@tbonfort
Copy link
Member

tbonfort commented Jan 17, 2024

There seems to be an issue with how the resolution is handled. My test case is to mosaic a dem, each tile being a one-degree-square tif file. The tif files have varying resolutions depending on the latitude, e.g. at the equator Pixel Size = (0.000222222222222,-0.000222222222222), and at the poles Pixel Size = (0.002222222222222,-0.000222222222222) (note the factor 10 on x)

When I run: gdaltindex -f FlatGeoBuf -ot Float32 -tr 0.000222222222222 -0.000222222222222 -gti_filename wdem.fgb.gti -lyr_name wdem test.gti.fgb $(<file.lst), the resulting GTI has a Pixel Size = (0.002222222222222,-0.000222222222222) .

Extracting a square from the resulting GTI results in the incorrect resolution to be used:
gdal_translate -projwin 2.2 2.2 2.3 2.1 test.gti.fgb fromgti.tif =>

gdalinfo fromgti.tif 
Driver: GTiff/GeoTIFF
Files: fromgti.tif
Size is 45, 450

@rouault
Copy link
Member Author

rouault commented Jan 17, 2024

gdaltindex -f FlatGeoBuf -ot Float32 -tr 0.000222222222222 -0.000222222222222

you should use a positive value for -tr values

@rouault
Copy link
Member Author

rouault commented Jan 17, 2024

you should use a positive value for -tr values

gdaltindex modified to take the absolute value.

FlatGeoBuf format with a ``.gti.fgb`` extension, meeting the minimum requirements
for a GTI compatible tile index, detailed later.

For example: ``tileindex.gti.gpkg``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault I'm 👎 on this. A new kind of format hint is more confusing than helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new kind of format hint is more confusing than helpful.

Why ? How would the driver be able to quickly identify (that is without opening the dataset) that a GeoPackage file is actually meant to be used with the GTI driver ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the "GTI:" suffix might work well for command line usages, but the aim is also that users can for example drag&drop such files in QGIS if needed, and thus it needs to look as a regular file, hence the need for a hint in the filename

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgillies would GTI:tileindex.gpkg be more acceptable? I kind of agree that it could be misleading that a file could be opened with an unexpected driver based solely on its name (one could argue that the .gti part of .gti.gpkg is not the extension)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault GeoPackage can already be vector or raster. How does QGIS currently determine which driver to use if you drag and drop? Is peeking in the file really not allowed? It's pretty normal behavior for applications like QGIS, yes?

Is "gti" in the filename a signal to QGIS? And then it uses that info to determine whether to flag GDAL_OF_RASTER or GDAL_OF_VECTOR to GDALOpenEx()? GDAL doesn't have a generic opener that dispatches to one or the other yet, does it? I thought you had to know the data model before you call.

Generally, I believe that more and more special cases cause a degradation of user experience. And increase the complication for projects like Rasterio, that's for sure. From my perspective it would be great if we could avoid new special cases.

Copy link
Member Author

@rouault rouault Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does QGIS currently determine which driver to use if you drag and drop?

It largely delegates to GDAL. Here one point I didn't investigate on QGIS side is if he'd prefer the raster driver (GTI) or the vector driver (GPKG), but if using the QGIS browser, it might propose both. So maybe a bit of tuning needed on QGIS side to make it prefer the raster, unless the user explicitly do "open vector layer"

Is peeking in the file really not allowed?

generally we avoid doing complex queries as much as possible for the GDALDriver::Identify() method. We might look at a signature in the first bytes when there's one, but here that's not possible. For a SQLite DB, that would require sqlite3_open() (which must be avoided as much as possible because of SQLite locking issues) and some non trivial queries

Is "gti" in the filename a signal to QGIS?

no, it is a signal for the OGR GPKG driver not to recognize such file as a raster when GDALOpen() in raster mode is attempted on that file.
Cf the following snippet in the Identify() method of the GPKG driver:

    if ((poOpenInfo->nOpenFlags & GDAL_OF_RASTER) != 0 &&
        ENDS_WITH_CI(poOpenInfo->pszFilename, ".gti.gpkg"))
    {
        // Handled by GTI driver
        return FALSE;
    }

And increase the complication for projects like Rasterio, that's for sure.

Rasterio shouldn't have to care about that at all.The end user provides some opaque string "foo.gti.gpkg" to RasterIO open() method which just delegates that to GDALOpen() without having to know more about it. Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an extra commit so that the regular GPKG driver is able to open a GeoPackage tiled raster even if its filename ends with .gti.gpkg, so there not be any functional regression due to the introduction of the GTI driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using directly a enhanced tile index as a raster VRT ?
9 participants