Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type field to Collection & Catalog #971

Merged
merged 29 commits into from
Mar 3, 2021
Merged

Add type field to Collection & Catalog #971

merged 29 commits into from
Mar 3, 2021

Conversation

cholmes
Copy link
Contributor

@cholmes cholmes commented Feb 5, 2021

Related Issue(s): #889

Proposed Changes:

  1. Added type field to catalog and collection spec
  2. Added best practice section on how to distinguish stac files

PR Checklist:

  • This PR is made against the dev branch (all proposed changes except releases should be against dev, not master).
  • This PR has no breaking changes.
  • I have added my changes to the CHANGELOG
  • This PR affects the STAC API spec, and I have opened issue/PR #XXX to track the change.

@cholmes cholmes added this to the 1.0.0-RC.1 milestone Feb 5, 2021
@cholmes
Copy link
Contributor Author

cholmes commented Feb 5, 2021

Ok, this is up for discussion, feedback welcome. I think it's in line with what we discussed.

  • re-used the type field from GeoJSON, so it aligns with Item and ItemCollection.
  • Set value to Catalog and Collection. The capital is a bit off in our spec, but aligns with GeoJSON, where we got the type field from.
  • Wasn't sure if we should prefix with STAC in some way, as FeatureCollection vs Collection might be confusing. But I don't love the options: STACCollection or StacCollection. Or we could not try to align with their capital camel case, and then it'd be stac_catalog and stac_collection.

Thoughts?

Copy link
Collaborator

@m-mohr m-mohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs JSON Schema changes.

Also, the "How to Differentiate STAC Files" part could probably be a bit simplified.

I'll work on it.

@cholmes
Copy link
Contributor Author

cholmes commented Feb 5, 2021

Oh, yeah, apologies - I meant to mention that I'd really like your help on the schema changes. Thanks @m-mohr! And thanks for help on differentiating STAC files. Also our linter forced me to pick a 'language' for the code blocks, but I think you just wrote it in pseudo-code? If you've got better formatting ideas then go for it. Could just do it in python or javascript, and people will get the idea.

CHANGELOG.md Show resolved Hide resolved

Any tool that crawls a STAC implementation or encounters a STAC file in the wild needs a clear way to determine if it is an Item,
Collection, Catalog or [ItemCollection](https://github.com/radiantearth/stac-api-spec/tree/v1.0.0-beta.1/fragments/itemcollection)
(part of the [STAC API spec](https://github.com/radiantearth/stac-api-spec/tree/v1.0.0-beta.1). As of 1.0.0 this is done primarily with the `type` field, and secondarily in Items with `stac_version`, or optionally the `rel` of the link to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this challenge is less pronounced in API settings than static catalog settings, since the OpenAPI spec for the API tells you what to expect at each endpoint (and also catalogs don't live anywhere in the API). That said if someone saves off an ItemCollection and then refers to it from a static catalog, I guess all bets are off, so I think I've talked myself into this inclusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing your thoughts, even if you talked yourself into it. And I agree with both points - some people may save the itemcollection out in the wild.

catalog-spec/catalog-spec.md Outdated Show resolved Hide resolved
@m-mohr
Copy link
Collaborator

m-mohr commented Feb 8, 2021

One question that came up during the JSON Schema update:

Collection inherits from Catalog. A Catalog has type set to Catalog. I can't simply override that in Collection if I inherit from Catalog. I could make it work as an array (type: ['Catalog'] and type: ['Catalog', 'Collection'] or let the schemas diverge, but I think I'd expect the catalog schema to validate my Collection as we claim Collection inherits from Catalog. So either we need to weaken the inheritance or change something in this PR, I think.

Opinions?

@cholmes
Copy link
Contributor Author

cholmes commented Feb 9, 2021

We discussed this on the call, and the consensus seemed to be that we should do something where the collection type includes the catalog in it, so that a client can check for 'catalog' with a regex and match both, or can check for the specific.

We should do a bit more research on options, but the one idea was to look to media types for what they do, with like geo+json. The obvious proposal based on what we have would be Catalog and Catalog+Collection. Though it's sorta half inspired from GeoJSON (with the casing) and half from media types. Open for other ideas.

@matthewhanson
Copy link
Collaborator

I'm fine with Catalog+Collection, but also think that just Collection by itself is OK since the spec clearly defines that Collection is a Catalog.

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 9, 2021

Using just "Collection" you can't use the Catalog JSON Schema to validate the Collection, which should be possible as it states that a Collection is a Catalog.

@kylebarron
Copy link
Contributor

It seems then that the core issue here is that JSONSchema doesn't support true inheritance? In a fuller language like Python/JS, you could presumably subclass Catalog to overwrite the type key, but that would still fail if you tried to use a Catalog object to validate a collection.

@cholmes
Copy link
Contributor Author

cholmes commented Feb 9, 2021

And just for posterity, the favorite idea from the STAC call was Catalog¯\_(ツ)_/¯Collection

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 9, 2021

@kylebarron Although Python and JS would probably allow that (as they can't restrict the values returned by a method), it's not strictly correct inheritance.
If you have a contract in a language that says getType() returns always a string Catalog and you subclass it and return something else, then you break the contract and diverge from what others would expect. You could say getType() just returns a string, but then of course you can't rely on Catalog. And if you define in Catalog that it can return Catalog and Collection, you are mixing thing and you'd need to know all subclasses upfront. If at some point we invent the SuperDuperCatalog that inherits from Catalog, you'd need the implementation of Catalog to allow that. That should be avoided at all costs in OOP. So this is not the fault of JSON Schema IMHO. In languages like Java you'd do instanceof or so, which allows to check whether any of the parent classes is a Catalog, you wouldn't implement such behavior in getType methods. Another example: If you have a method that returns an integer or float, you can't say in a subclass that you additionally allow a string. You can only be equally strict or more strict and for example allow integers only. Weakening contracts is a bad practice and that's what we'd do here with changing Catalog to Collection in the sub-"class".

That got longer than I planned it to be 😅

@jisantuc
Copy link
Contributor

I think there's a little bit of ambiguity around inheritance.

The STAC Collection specification is independent of STAC Items and STAC Catalogs.

(emphasis mine)

But also

It shares the same fields and therefore every Collection is also a valid Catalog - the JSON structure extends the core Catalog definition.

(admittedly there's much more backing up the second point, but the first is the third sentence in the collection readme)

One thing that I've never been clear about is whether it's important for every collection to be a valid catalog, or whether that's something incidental that's been carried over for a long time. Is that desire from OGC data type relationships? Or can we break it without going out on a limb?

@cholmes
Copy link
Contributor Author

cholmes commented Feb 10, 2021

One thing that I've never been clear about is whether it's important for every collection to be a valid catalog, or whether that's something incidental that's been carried over for a long time. Is that desire from OGC data type relationships? Or can we break it without going out on a limb?

It's definitely not from OGC data type relationships, and I don't think we need to worry about that. To me the important thing is to be able to crawl catalogs and collections in a hierarchy and be able to expect them to operate the same way. But I don't know that it's important that it truly is a valid catalog. Just that it has all the same link / field requirements. But I think I'm ok with changing how we talk about it to just say that they have the same core fields and clients can crawl them equally.

But it does make some intuitive sense to me to explain that a collection is a catalog with some extras. I do think the other option might just be to drop trying to add the type field. I was imagining we could just have a simple type and it'd work well, but now I'm less sure it's really worth the effort given the constraints.

@cholmes cholmes assigned jisantuc and unassigned cholmes Mar 2, 2021
we sure don't have a lot of example catalogs 🤔
best-practices.md Outdated Show resolved Hide resolved
best-practices.md Outdated Show resolved Hide resolved
best-practices.md Outdated Show resolved Hide resolved
best-practices.md Outdated Show resolved Hide resolved
catalog-spec/catalog-spec.md Outdated Show resolved Hide resolved
cholmes and others added 6 commits March 3, 2021 06:07
Co-authored-by: Matthias Mohr <[email protected]>
Co-authored-by: Matthias Mohr <[email protected]>
Co-authored-by: Matthias Mohr <[email protected]>
Co-authored-by: Matthias Mohr <[email protected]>
Co-authored-by: Matthias Mohr <[email protected]>
best-practices.md Outdated Show resolved Hide resolved
Co-authored-by: Phil Varner <[email protected]>
# Conflicts:
#	examples/extensions-collection/collection.json
overview.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants