Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Define best practices for defining collections when produced for specific stories #58

Open
abarciauskas-bgse opened this issue Mar 28, 2022 · 12 comments
Assignees

Comments

@abarciauskas-bgse
Copy link
Contributor

The question has arisen for datasets we are curating specifically to tell the EJ story of Hurricanes Ida and Maria: how do we want to organize collections that are produced for specific events (e.g. Hurricanes). The consensus so far is that we want to publish to generalizable collections as much as possible rather than collections scoped to a specific event.

What this means is that for the EJ story we will be creating and using the following collections:

  • reformatted planetscope data - reformatted to COG, segmented by zip code for Hurricanes Ida and Maria but could be extended to other spatial and temporal locations in the future
  • blue tarp detections from this planetscope collection, starting with Hurricanes Ida and Maria but these detections could be created for other localities in the future
  • HLS data - no special collection for these events (however for the configuration of stories, we should be able to specify temporal and spatial extents of this data)
  • nightlights data - we already have https://staging-stac.delta-backend.xyz/collections/nightlights-hd-monthly/items, however the Ida and Maria nightlights data are for specific dates before during and after the hurricanes. Does it make sense to put it in the same collection?

Interested in any additional thoughts or considerations from the team @anayeaye @slesaad @xhagrg @leothomas @danielfdsilva

@abarciauskas-bgse abarciauskas-bgse self-assigned this Mar 28, 2022
@sharkinsspatial
Copy link

@abarciauskas-bgse From the STAC perspective I would recommend following the consensus decision you referenced of using only generalizable collections. This avoids conflating collection definition with a specific story and seems more in spirit with the STAC collection specification.

This seems like a best practice but presents another issue, where to store story specific filters to access only the items in the general collection which are relevant to the story? I'd suggest that using some type of filter key (which contains a valid CQL query body) in the UI config described here might be one option. This has 3 advantages

  1. The ranges and filter configuration for your story would be managed in a static file under version control so changes or new story releases could be driven by application CI.
  2. Using a filter means that as new data is added to your STAC API it is immediately reflected in the application without the need for loading data into a specific collection.
  3. Data in a generalized collection can participate in multiple stories and still seem semantically correct.

@anayeaye
Copy link
Contributor

@abarciauskas-bgse @sharkinsspatial @xhagrg the temporal cadence of the hurricane event nightlights data is a unique case:

  • We are using a nominal datetime for nighlights hd monthly
  • The hurricane event items use start/end datetimes
  • I don't think that pgstac would have issues searching a collection with both types of temporal reference because of the way that pgstac stores item data but other stac-api implementations might have trouble
  • I am not actually sure what value the nightlights pixels represent (daily average over a month/range?)
  • I am not certain that we shouldn't be using start/end dates for monthly data instead of nominal datetimes (although I think the stac spec best practice nominal datetime is suitable if we are not combining the event items with the existing nightlights-hd-monthlly collection).

Nightlights events test metadata with start/end
Nightlights monthly hd with nominal datetime

@abarciauskas-bgse
Copy link
Contributor Author

I sent Ranjay an email about the temporal nature of the BMHD monthly files - I think if he can verify that the start_ and end_datetime of the monthly data files can be used for the nightlights-hd-monthly we should put the Hurricane files in that same collection. The downside of this is that the collection has to be described as having a dashboard:time_density of "multi-day" instead of "month" because the temporal extent of the files for the Hurricanes is greater than a month, not a month. @danielfdsilva @anayeaye will that will be problematic for the dashboard?

  • The Stage 0, baseline Image (Before Hurricane Maria), is from average night lights July 21 to Sept 19, 2017.
  • Stage 1, immediate damage assessment period following Hurricane Maria ( Average night lights Sep. 20 to Nov. 20, 2017);
  • Stage 2, ensuing relief efforts led by FEMA and the US military (Average night lights Nov. 21, 2017 to Jan. 20 2018);
  • Stage 3, recovery efforts led by PREPA and the US Army Corps of Engineers ( Average night lights Jan. 21 to Mar. 20, 2018)

I believe for Ida there is just a day before the hurricane 2021-08-09 and a day after the hurricane 2021-08-31, so I think just a single datetime does make sense for those items.

@xhagrg
Copy link

xhagrg commented Mar 30, 2022

@abarciauskas-bgse @anayeaye after reading the response from Ranjay, it looks like we can just use the start and end date time of the corresponding month? Do we move ahead with ingestion of these files in the same collection?

@abarciauskas-bgse
Copy link
Contributor Author

Yes thanks @xhagrg for checking, I think we should consolidate in the nightlights-hd-monthly dataset and also add start_ and end_datetimes to the existing COGs for the month of each file. However we can create a new issue to do that.

@abarciauskas-bgse
Copy link
Contributor Author

Also Ranjay shared these links as the product pages for the dataset: https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/VNP46A3/, https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/VNP46A4/

@anayeaye @sharkinsspatial is there a good place for these types of references in the STAC collection metadata? I'm checking with Ranjay but my guess that those are the product pages for the source HDF5 used to generate the COGs we have.

@anayeaye
Copy link
Contributor

@abarciauskas-bgse @sharkinsspatial @xhagrg RE: datetimes
I don't have a specific reference to back this up but I think it is probably a good idea to choose to use datetime OR start/end for all items in the collection. It's a bit of a stretch but if user is paging through a collection of items for a search they should be able to expect datetime information on the same property for all items in the response. I don't think pgstac would have any trouble with the search on a collection with mixed datetime properties but I'm also not sure how we would communicate this information to the end user. Mixed datetimes could also complicate using other stac-apis for these items in the future.

I also don't want to block ingest on the nightlights data--I think it will be fine either way because it is small enough to easily refactor or reingest if needed.

@anayeaye
Copy link
Contributor

@abarciauskas-bgse Just refreshed and saw the Collection level metadata question above. I think these references would be good links to add to the document. This HLS delta collection has external links to metadata, maybe we could follow this pattern: https://dev-stac.delta-backend.xyz/collections/HLSS30.002

  "links": [
    <SNIP>
    {
      "rel": "external",
      "href": "https://cmr.earthdata.nasa.gov/search/concepts/C2021957295-LPCLOUD.html",
      "type": "text/html",
      "title": "NASA Common Metadata Repository Record for this Dataset"
    }
  ]

@abarciauskas-bgse
Copy link
Contributor Author

@xhagrg per @anayeaye's comment about start_ and end_datetime, I think we will want to include start_ and end_datetime for Ida files. Sorry for the re-work. I see those files are already published to https://dev-stac.delta-backend.xyz/collections/BMHD/items

@xhagrg
Copy link

xhagrg commented Mar 31, 2022

@abarciauskas-bgse I will be using the "nightlights-hd-monthly" collection which already exists. will be adding the start_ and end_datetime in the properties. Do we retain the datetime field? or set it to none as done previously?

@abarciauskas-bgse
Copy link
Contributor Author

datetime is required field (see https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#datetime) but setting it to null is acceptable if start_ and end_datetime s are specified

@gadomski
Copy link
Contributor

Moving this to veda-data as it's a good/useful conversation, and we're sunsetting this repo: https://github.com/NASA-IMPACT/veda-architecture/issues/322.

@gadomski gadomski transferred this issue from NASA-IMPACT/veda-data-pipelines Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants