Grids Extension #688

abhijeetkumarga · 2019-12-02T01:59:44Z

Related Issue(s): #

Proposed Changes:

Add a Grids extension to enable single items to have assets with different CRS-referenced projections. The data is encoded similar to GDAL GetGeoTransform or the Rasterio Transform

PR Checklist:

This PR has no breaking changes.
I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.
API only: I have run npm run generate-all to update the generated OpenAPI files.

cholmes · 2020-04-02T13:16:39Z

Hey @abhijeetkumarga - apologies for the very slow response on this PR, but I personally only just finally understood why this is useful for some of my use cases, when @matthewhanson explained the thinking behind it.

He said that it's mostly used to be able to populate a data cube without opening every single file to get its pixel information. At Planet we provide fields in our data api to enable GDAL VRT's and ESRI virtual catalogs. In our case it's offset_x and offset_y plus rows and columns, plus epsg code. But our team finds the shape + transform construct meeting our needs and a nice general solution, so we could potentially align on it as well as we transform to a STAC compliant metadata search API.

I think the only tweak we might recommend is making it so you can just have one grid and have it be the default by default - we don't usually have different CRS's or resolutions at a per asset level - the grid is most always the same for all assets.

I'm not sure much needs to change with the details, but I think it'd be really good to describe the use cases a bit more fully in the extension readme, and perhaps to have a simpler example of just using it to describe a single asset. I'd be happy to help out on this.

alexgleith · 2020-04-02T23:59:00Z

Hey @cholmes, when you say default by default do you think that we should handle there not being a grid against an asset and pick up the default grid then?

We could add some info on how we use the grids in the Open Data Cube world in the readme, is that the kind of use case you're thinking?

I can pull together a full example based on @matthewhanson's Sentinel-2 0.9 STAC and add grid info, perhaps. (Although S-2 has three grids, so it's not all that simple, actually.)

cholmes · 2020-04-03T22:05:37Z

@alexgleith - reading back on what I wrote I definitely wasn't so clear on communication, and I think my thinking on it wasn't so clear either.

So the main thing I'm thinking about is how to make this general for simple cases, to communicate the overall shape / transform in the case where all your assets would use the same one.

When I wrote it I was imagining that you could declare one grid, and then if you didn't explicitly name a grid in your asset then the client could assume that you wanted to apply that grid.

But thinking on it more I remembered that I really don't like json objects in properties, as they can't be read by typical GIS programs, as most GIS doesn't understand nested structures.

So what I'm thinking now is for a 'simple' case (you don't have assets with different spatial resolutiosn) we have proj:shape and proj:transform as properties (riffing off the suggestion in #756). So Planet would just use those, replacing offset_x, offset_y, rows, columns.

Perhaps just getting those into proj and then keeping this as an extension makes sense - right now the extension is introducing a few different concepts. It could be cleaner if shape and transform are introduced in proj extension, and then this can just talk about the structure to use it on assets, reusing the same proj:shape and proj:transform names.

So I'm realizing the use case stuff I'd like to see is more at the shape and transform level - just the use case of populating a .vrt or a data cube without having to open the file. And then this can focus on the grids. But yes, more info on how you use grids in open data cube I think would be helpful.

If we do keep it as an extension I do think we should brainstorm a less generic name for it. Can be a longer name, and maybe keep 'grids' as the prefix.

matthewhanson · 2020-04-04T04:13:14Z

I really like the idea of having this with the proj extension, as thematically they very much go together.

I'm not a big fan of having the additional grids dictionary either, but I think it's actually more common to have more than 1 grid in an Item then not. All Landsat and Sentinel data has bands with different resolutions, as do all the Worldviews. Pan bands are common for a lot of satellites.

Is it really worth the saved bits to not have redundant information across assets?

We could just have "proj:shape" and "proj:transform" for all assets:

proj:shape: [1000, 1000],
proj:transform: [1, 1, 0, 1, 1, 0]

vs:

proj:grid: "gridkey"

Then the grid dictionary has at the least:

grids: {
   "gridkey": {
       proj:shape: [1000, 1000],
       proj:transform: [1, 1, 0, 1, 1, 0]
   }
}

For sentinel, we repeat a 2 element array and a 4 element array a few times each, it doesn't seem like a whole lot of extra info, but is simpler and more readable.

alexgleith · 2020-04-06T02:44:24Z

I can see advantages both ways. The simplicity of having:

proj:shape: [1000, 1000],
proj:transform: [1, 1, 0, 1, 1, 0]

on all assets versus the "don't repeat yourself" of having a lookup table.

I don't feel strongly either way. I guess I lean slightly towards the lookup table, but ultimately, I'll defer to you two.

cholmes · 2020-04-06T16:48:54Z

I'm not a big fan of having the additional grids dictionary either, but I think it's actually more common to have more than 1 grid in an Item then not. All Landsat and Sentinel data has bands with different resolutions, as do all the Worldviews. Pan bands are common for a lot of satellites.

I think that's true for 'non-derived satellite imagery', but I believe the goal of STAC is to have Items handle far more... A composite or derived index wouldn't have more than one grid. Harmonized data wouldn't have more than one grid. Aerial / drone imagery, video, etc.

Is it really worth the saved bits to not have redundant information across assets?

At the level of an item? That could have 5+ assets that repeat the info? I'd say potentially yes. For every other construct we have in STAC. no. But Planet's catalog is 700 million+ records, and is accessed very often, so it is a lot of extra bits flying around.

I think I do like just having proj:shape and proj:transform. Then we don't have to shove a dictionary into properties.

I'd like to solve the ability to specify asset level defaults at the property level in a more generic way. Like it feels like we should have a mechanism / ability to say 'all assets have X as their asset property'. Like if I have 3 assets and they are all bands 2,3,4 then just say "eo:bands": [2,3,4] once. Like it's sorta a way to declare a 'default' for your assets. Another use case I heard recently is zoom levels - hints to a tile server about where a COG works well (overviews have been generated).

Perhaps it's too simple or puts too much burden on the client, but we could just say if you put shape / transform at the property level it applies to all assets, and if you put it at the asset level then it applies just to that asset. Though perhaps that opens up too much complexity - could any property be specified at the asset level?

Kirill888 · 2020-04-07T05:25:29Z

I too think that this is better off being grouped in proj. Saves the trouble of coming up with the new name.

I think @cholmes suggestion to have global proj:{shape,transform} that applies to every asset unless overwritten at asset level is pretty clean and readable. Makes simple case simple (and byte efficient), and complex case possible (while also being byte efficient, in the case when only few bands are different).

It does require extra complexity from code interpreting the structure, but so does having "name pointers", that might be misspelled or missing from the grids dictionary altogether. It would certainly work for us at open datacube.

Primary use case for open datacube to have this data "ahead of read time" is "Data Processing Planning". With this data available in the metadata, we can answer questions like “can we use native resolution/projection to load ALL the data for a given spatio-temporal query and how much RAM will this require”. This is helpful for both interactive data exploration by a data scientist in a Jupyter notebook as well as for large scale distributed processing, particularly when time axis reduction is performed.

Basically building a dask array using "native projection and resolution" without peeking into every file ahead of time, which can be very expensive for S3/http data.

alexgleith · 2020-04-17T04:29:21Z

This has been replaced by #780

matthewhanson · 2020-04-17T04:58:19Z

Closing in favor of #780 which makes the proposed "grids" extension part of the "proj" extension instead.

abhijeet and others added 19 commits November 22, 2019 01:26

first commit

300eacd

initial write-up

6dd0b2b

Update README.md

4a3460d

more changes

244ef2f

adding in a file to use to build up the example json

1e59ede

more additions

9e94442

ready to first socialising

a67cc8d

quick fixes

fd3b365

quick fixes

41f71fa

kirill-changes

9afae1b

alex changes

407c7c1

Update CHANGELOG.md to include the addition of the Grids Extension

2fe0281

sorry - moved the addition to 'extensions'

893a7f9

remove landsat8 file

250c59b

small changes

6d7ccea

small changes

57b8d54

some more changes

b5f76fe

Increment version

5a5aea0

Remove name

dbbe42f

matthewhanson mentioned this pull request Apr 2, 2020

"grids" extension #756

Closed

Merge branch 'dev' into gridsextension

1af2c4d

cholmes mentioned this pull request Apr 7, 2020

Extended Metadata Extension #757

Closed

matthewhanson mentioned this pull request Apr 9, 2020

Asset specific metadata #760

Closed

matthewhanson closed this Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grids Extension #688

Grids Extension #688

abhijeetkumarga commented Dec 2, 2019 •

edited by m-mohr

Loading

cholmes commented Apr 2, 2020

alexgleith commented Apr 2, 2020

cholmes commented Apr 3, 2020

matthewhanson commented Apr 4, 2020

alexgleith commented Apr 6, 2020

cholmes commented Apr 6, 2020 •

edited

Loading

Kirill888 commented Apr 7, 2020

alexgleith commented Apr 17, 2020

matthewhanson commented Apr 17, 2020

Grids Extension #688

Grids Extension #688

Conversation

abhijeetkumarga commented Dec 2, 2019 • edited by m-mohr Loading

cholmes commented Apr 2, 2020

alexgleith commented Apr 2, 2020

cholmes commented Apr 3, 2020

matthewhanson commented Apr 4, 2020

alexgleith commented Apr 6, 2020

cholmes commented Apr 6, 2020 • edited Loading

Kirill888 commented Apr 7, 2020

alexgleith commented Apr 17, 2020

matthewhanson commented Apr 17, 2020

abhijeetkumarga commented Dec 2, 2019 •

edited by m-mohr

Loading

cholmes commented Apr 6, 2020 •

edited

Loading