Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fields are not specific to landsat and should be in separate extensions #2

Open
m-mohr opened this issue Nov 7, 2023 · 4 comments

Comments

@m-mohr
Copy link

m-mohr commented Nov 7, 2023

When discussion STAC extensions in the broader area of ARD recently (CEOS meeting, BiDS conference), it came up that some of the fields in this extension (and also the old extension) should not be in the landsam extension, but in other extension that are more generic.

For example, landsat:cloud_cover_land is a generic concept that is not specific to Landsat.
Similarly, product_generated seems to be covered by created on the asset level.
Similar questions could be raised for correction, path, row and other fields.
For each fields there should be a good reason, why the field is Landsat specific. Otherwise, it should be part of another extension.

We had the discussion that it would be appreciated if these fields could be generalized so that they can be reused and not end up also e.g. in a Sentinel extension. cc @philvarner

@m-mohr
Copy link
Author

m-mohr commented Nov 7, 2023

PS: I feel the 2.0.0 release was a little fast, there was literaly no time for reviews.

@matthewhanson
Copy link
Member

@m-mohr Agree we should use existing extensions as appropriate, just posted this with regard to Sentinel-2 data
stac-extensions/eo#28

My thoughts on Landsat

  • scene_id: Every data provider will have specific identifiers that may not be the same as Item IDs. In fact, they may actually have multiple IDs associated with an Item (e.g. Sentinel-2 tile_id, product_id, datastrip_id). It is of value to have these field names be specific to the provider and use the same terms they use (e.g., granule, tile, product).
  • collection_category - is a rough measure of geometric accuracy (T1, T2, or real-time), not sure where that would belong if at all, and I'm not sure this is a concept worth of capturing in an ARD spec.
  • collection_number - this is basically a version on the collection, which is typically captured in the collection name. I actually don't think it's even worth including in the STAC Items at all
  • wrs_type, wrs_path, and wrs_row represent the two different types of tiling schemes used by Landsat (WRS-1 and WRS-2) which are specific to Landsat and would never be used by other satellites. Landsat is captured in the grid extension, but it's also something that users commonly filter on.
  • cloud_cover_land is just the cloud cover, except over land, which eo:cloud_cover is presumably the coverage over the whole scene. While it seems useful I've not seen it calculated elsewhere. See my Sentinel-2 issue
  • correction - this one does in fact seem to belong in the processing extension....except that processing levels aren't standard.
  • product_generated - I agree this is the same as created at the asset level. For Landsat, these dates would all be the same and we need a way to easily check the processing date of the entire scene to compare against another version of the scene due to reprocessing. It's definitely a general concept that would be nice to have captured elsewhere, but the shifting meaning of created for Items or Assets appears to make this difficult.

@m-mohr
Copy link
Author

m-mohr commented Nov 8, 2023

Thanks, some comments below:

scene_id: Records has the concept of externalIds, which could cater for this (as a STAC extension). So having a general concept for external IDs would be better than a proprietary field.

collection_category: What does t1 and t2 mean?

collection_number: Great, one proprietary field less.

cloud_cover_land: WIll have a look at the EO issue...

correction: So until it's standardized you can use processing:level?

product_generated: That sounds like an implementation issue shifted into the metadata? If the time is available in assets, you can query it in principle.

@philvarner
Copy link
Contributor

I agree that this can and should be made better. The intention of this was to create a community spec matching the existing proprietary spec that's already in use by both Planetary Computer and Earth Search, with the addition of one field that I need to use for Earth Search work. We can always make these improvements and release as a v3.0.0 and as part of other to-be-created extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants