-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grids Extension #688
Grids Extension #688
Conversation
Hey @abhijeetkumarga - apologies for the very slow response on this PR, but I personally only just finally understood why this is useful for some of my use cases, when @matthewhanson explained the thinking behind it. He said that it's mostly used to be able to populate a data cube without opening every single file to get its pixel information. At Planet we provide fields in our data api to enable GDAL VRT's and ESRI virtual catalogs. In our case it's offset_x and offset_y plus rows and columns, plus epsg code. But our team finds the shape + transform construct meeting our needs and a nice general solution, so we could potentially align on it as well as we transform to a STAC compliant metadata search API. I think the only tweak we might recommend is making it so you can just have one grid and have it be the default by default - we don't usually have different CRS's or resolutions at a per asset level - the grid is most always the same for all assets. I'm not sure much needs to change with the details, but I think it'd be really good to describe the use cases a bit more fully in the extension readme, and perhaps to have a simpler example of just using it to describe a single asset. I'd be happy to help out on this. |
Hey @cholmes, when you say We could add some info on how we use the grids in the Open Data Cube world in the readme, is that the kind of use case you're thinking? I can pull together a full example based on @matthewhanson's Sentinel-2 0.9 STAC and add grid info, perhaps. (Although S-2 has three grids, so it's not all that simple, actually.) |
@alexgleith - reading back on what I wrote I definitely wasn't so clear on communication, and I think my thinking on it wasn't so clear either. So the main thing I'm thinking about is how to make this general for simple cases, to communicate the overall shape / transform in the case where all your assets would use the same one. When I wrote it I was imagining that you could declare one grid, and then if you didn't explicitly name a grid in your asset then the client could assume that you wanted to apply that grid. But thinking on it more I remembered that I really don't like json objects in properties, as they can't be read by typical GIS programs, as most GIS doesn't understand nested structures. So what I'm thinking now is for a 'simple' case (you don't have assets with different spatial resolutiosn) we have proj:shape and proj:transform as properties (riffing off the suggestion in #756). So Planet would just use those, replacing offset_x, offset_y, rows, columns. Perhaps just getting those into proj and then keeping this as an extension makes sense - right now the extension is introducing a few different concepts. It could be cleaner if shape and transform are introduced in proj extension, and then this can just talk about the structure to use it on assets, reusing the same proj:shape and proj:transform names. So I'm realizing the use case stuff I'd like to see is more at the shape and transform level - just the use case of populating a .vrt or a data cube without having to open the file. And then this can focus on the grids. But yes, more info on how you use grids in open data cube I think would be helpful. If we do keep it as an extension I do think we should brainstorm a less generic name for it. Can be a longer name, and maybe keep 'grids' as the prefix. |
I really like the idea of having this with the proj extension, as thematically they very much go together. I'm not a big fan of having the additional grids dictionary either, but I think it's actually more common to have more than 1 grid in an Item then not. All Landsat and Sentinel data has bands with different resolutions, as do all the Worldviews. Pan bands are common for a lot of satellites. Is it really worth the saved bits to not have redundant information across assets? We could just have "proj:shape" and "proj:transform" for all assets:
vs:
Then the grid dictionary has at the least:
For sentinel, we repeat a 2 element array and a 4 element array a few times each, it doesn't seem like a whole lot of extra info, but is simpler and more readable. |
I can see advantages both ways. The simplicity of having:
on all assets versus the "don't repeat yourself" of having a lookup table. I don't feel strongly either way. I guess I lean slightly towards the lookup table, but ultimately, I'll defer to you two. |
I think that's true for 'non-derived satellite imagery', but I believe the goal of STAC is to have Items handle far more... A composite or derived index wouldn't have more than one grid. Harmonized data wouldn't have more than one grid. Aerial / drone imagery, video, etc.
At the level of an item? That could have 5+ assets that repeat the info? I'd say potentially yes. For every other construct we have in STAC. no. But Planet's catalog is 700 million+ records, and is accessed very often, so it is a lot of extra bits flying around. I think I do like just having proj:shape and proj:transform. Then we don't have to shove a dictionary into properties. I'd like to solve the ability to specify asset level defaults at the property level in a more generic way. Like it feels like we should have a mechanism / ability to say 'all assets have X as their asset property'. Like if I have 3 assets and they are all bands 2,3,4 then just say "eo:bands": [2,3,4] once. Like it's sorta a way to declare a 'default' for your assets. Another use case I heard recently is zoom levels - hints to a tile server about where a COG works well (overviews have been generated). Perhaps it's too simple or puts too much burden on the client, but we could just say if you put shape / transform at the property level it applies to all assets, and if you put it at the asset level then it applies just to that asset. Though perhaps that opens up too much complexity - could any property be specified at the asset level? |
I too think that this is better off being grouped in I think @cholmes suggestion to have global It does require extra complexity from code interpreting the structure, but so does having "name pointers", that might be misspelled or missing from the grids dictionary altogether. It would certainly work for us at open datacube. Primary use case for open datacube to have this data "ahead of read time" is "Data Processing Planning". With this data available in the metadata, we can answer questions like “can we use native resolution/projection to load ALL the data for a given spatio-temporal query and how much RAM will this require”. This is helpful for both interactive data exploration by a data scientist in a Jupyter notebook as well as for large scale distributed processing, particularly when time axis reduction is performed. Basically building a dask array using "native projection and resolution" without peeking into every file ahead of time, which can be very expensive for S3/http data. |
This has been replaced by #780 |
Closing in favor of #780 which makes the proposed "grids" extension part of the "proj" extension instead. |
Related Issue(s): #
Proposed Changes:
GetGeoTransform
or the RasterioTransform
PR Checklist:
npm run generate-all
to update the generated OpenAPI files.