Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit STAC Types in JSON [was] Ability to identify STAC file types with tools like file #889

Closed
schwehr opened this issue Aug 26, 2020 · 46 comments
Labels
discussion needed prio: must-have required for release associated with
Milestone

Comments

@schwehr
Copy link
Contributor

schwehr commented Aug 26, 2020

Probably not something the average user is going to care about, but in exploring STAC files on my local machine, I find it hard sometimes to know which type of STAC file I'm looking at. Or if I'm trying to find the catalog, collection, or item, it might be hard to know from the command line. e.g. from Earth Engine:

file catalog.json MODIS_[MN]*
catalog.json:                            JSON data
MODIS_MCD43A4_006_BAI.json:              HTML document, ASCII text, with very long lines
MODIS_MCD43A4_006_EVI.json:              HTML document, ASCII text, with very long lines
MODIS_MCD43A4_006_NDSI.json:             HTML document, ASCII text, with very long lines
MODIS_MCD43A4_006_NDVI.json:             ASCII text
MODIS_MCD43A4_006_NDWI.json:             HTML document, ASCII text, with very long lines
MODIS_MOD09GA_006_BAI.json:              HTML document, ASCII text, with very long lines
MODIS_MOD09GA_006_EVI.json:              HTML document, ASCII text, with very long lines
MODIS_MOD09GA_006_NDSI.json:             HTML document, ASCII text, with very long lines
MODIS_MOD09GA_006_NDVI.json:             ASCII text
MODIS_MOD09GA_006_NDWI.json:             HTML document, ASCII text, with very long lines
MODIS_MOD13A1.json:                      HTML document, ASCII text, with very long lines
MODIS_MOD44W_MOD44W_005_2000_02_24.json: JSON data
MODIS_MYD09GA_006_BAI.json:              HTML document, ASCII text, with very long lines
MODIS_MYD09GA_006_EVI.json:              HTML document, ASCII text, with very long lines
MODIS_MYD09GA_006_NDSI.json:             HTML document, ASCII text, with very long lines
MODIS_MYD09GA_006_NDVI.json:             ASCII text
MODIS_MYD09GA_006_NDWI.json:             HTML document, ASCII text, with very long lines
MODIS_MYD13A1.json:                      HTML document, ASCII text, with very long lines
MODIS_NTSG_MOD16A2_105.json:             JSON data

This is a bummer! It would be awesome if file was able to return STAC, the version, and one of {Catalog, Collection, Item}. There might be a better way, but this could possibly work for file:

  • Mandate stac_version be 1st
  • Mandate stac_type (or some similar name) be 2nd

That runs a bit counter to the idea of the general flexibility of json, which would also be a bummer. If the idea is loosened to just letting readers know the type (like if I can read the json in some other language that doesn't have a stac library), then adding something that gives the type might be useful.

Thoughts?

@schwehr
Copy link
Contributor Author

schwehr commented Aug 26, 2020

@m-mohr Can you give a little more than just a thumbs down? The entire idea? unix file (which isn't the most flexible tool)? The specific suggestion?

If you were to faced with a file of STAC json (possibly unsorted) and didn't have a STAC library, how would you distinguish the types?

Some use cases I'm thinking of: Use sends me some stac files to be added to Earth Engine. Or all stac found by crawling the internet. Or someone puts some stac files and data in a GCS bucket and we want to figure out what it is and what issues there might be.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 26, 2020

Sorry for the thumbs down without a comment. I started with the reaction and started to comment, but got distracted and forgot it then.

There are a couple of suggestions in the best practices guide, which should make it a bit better. Does your catalog follow them? The only thing that I think could be improved there is to allow collection.json for collections instead of using catalog.json.
If those best practices are not followed, then other best practices are unlikely to be followed, too and I don't think we should enforce any naming scheme.
The second issue is that putting version numbers in the names might not be the best idea as your file names change for every STAC version then. That doesn't seem right and makes upgrading painful.

@schwehr
Copy link
Contributor Author

schwehr commented Aug 26, 2020

Sorry, I will try to clarify. Hopefully this will be more obvious.

I was not thinking of file names at all, but the json contents. e.g. I might just be getting a string blob of json without any file as an RPC, from some database, or who knows where.

I will look more closely at best-practices.md, but those don't seem to be along the lines of what I was intending with this issue. Users are likely going to generate and pass around STAC files. The names of files may be wonky at best. Users are not likely to create a tree as described in Catalog Layout

e.g. with Earth Engine, we have > 500 collections, so naming them collection.json means we suddenly need a tree structure with one file in each directory. Users pass us yaml entries right now that don't have context. When we switch to STAC, they won't be in place for a user contribution. They may sometimes be malformed or copied from other sources without modifying them to fit into our structure.

Does that make more sense? e.g. Since json doesn't allow shebangs (#!) or // magic my original solution was saying what if the json always started as:

{
  "stac_version": "1.0.0-beta.2",
  "stac_type": "Item",

or

{
  "stac_version": "1.0.0-beta.2",
  "stac_type": "Collection",

or

{
  "stac_version": "1.0.0-beta.2",
  "stac_type": "Catalog",

Then I could submit a regex to the unix file magic database that would pick out the values for stac_version and stac_type. Then file foo*.json might give something like:

foo1.json         STAC Catalog, version 1.0.0-beta.2
foo2.json         STAC Collection, version 1.0.0-beta.2
foo3.json         STAC Item, version 1.0.0-beta.2

And it would be trivial for any language with a JSON reader to lookup the stac_type field from the JSON and know what they were dealing with.

@schwehr
Copy link
Contributor Author

schwehr commented Aug 26, 2020

Note that the unix file command won't look very far into a file for the magic strings it is trying to match. If it did, it would get crazy slow.

@cholmes
Copy link
Contributor

cholmes commented Aug 26, 2020

Interesting - I didn't know you could give hints like that to unix.

I'm definitely +1 on a recommendation (though probably not a requirement) to put stac_version at the top. stac_type I'd be for also recommending up top if we already had it, but I don't think we have that yet? Curious if there's other use cases for it. I do think it might help with validators, since I feel like staclint I've pointed at an item file and it tries to validate it as a catalog. So if there's a general use for having a stac type I'd be into it.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 26, 2020

JSON Objects don't have an order, that's coming from the JSON spec. So as long as you don't write JSON files by hand, you can't really guarantee in which order the properties occur in the file or you need a custom JSON writer that somehow sorts by property name. Some writers may do that already, but that's nothing we can really ask for. For example, the Google Earth Engine catalogs (I'm reading 0.6.2) changed the order of the properties in the past quite often. So seems your writer can't guarantee an order.

For the validation, it would be cleaner to have a type field, at the moment it's really a mess with reporting as we need to validate against all schemas and guess what could be most appropriate. We could use type:Feature for Items already and just add type:Catalog and type:Collection or just say: no type=Catalog (and Collection is a catalog). I'm a bit hesistant to put another field (which must be required to be useful for validation und thus is breaking) in the 1.0 spec.

Users are not likely to create folders

Strange, most larger catalogs I've seen have folders?!

Conclusion: I don't think we can really do what you are asking for. I'd suggest using a library such as PySTAC.

@cholmes
Copy link
Contributor

cholmes commented Aug 28, 2020

I'm a bit hesitant to put another field (which must be required to be useful for validation und thus is breaking) in the 1.0 spec.

Well, this is the beta period, so it's the last possible time we have to do something like this. I agree we should make sure it's really a win, and not consider it'll break things. But it's a pretty easy upgrade path. I like the idea of using type:Feature.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 28, 2020

So for validation I don't see a big win. I'd say for validation one could rely on the availibity of the type field in items and all other files would be validated against either collection or catalog. Of course it would be a bit more straightforward in validation to have type:Catalog and type:Collection, but I don't feel that it would be worth the breaking change. In the end, a catalog is a collection. What do @lossyrob and @jbants think?

What other advantages would it have?

@lossyrob
Copy link
Collaborator

Having a stac_type field would make identification of the object type on deserialization a little less complicated in PySTAC, but not a great deal. Currently it uses this logic to check for properties that are required in one but not others of { Catalog, Collection, Item } in order to identify the object type. However this is a bit brittle in that if someone adds an extension that breaks the "in one and not the others" assumption, this logic will break. Checking a stac_type property would make this logic more resilient. Still, I'm not sure it's worth the additional field on every object.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 1, 2020

This is outdated with regards to ItemCollections

Maybe we should come up with a best practice how to detect the files. It seems that you are using a different mechanism as I'm using in the Node Validator and there's likely more out there. I feel it should be the same across the board.

Something like:

if type is defined
  if type is 'Feature' and stac_version is defined // stac_version in items is only available since 0.8, check for (stac_version or assets) to support pre-0.8 data
    => Item
  else if type is 'FeatureCollection' and stac_version is defined
    => ItemCollection
  else
    => Invalid (GeoJSON)
else if stac_version is defined // available since STAC 0.6 and I guess we don't support any thing older?
  if extent is defined or license is defined // or makes more sense for validation, and would likely make more sense for PyStac 
    => Collection
  else
    => Catalog
else
  => Invalid (JSON)

Would that also work for PySTAC, @lossyrob ?

m-mohr added a commit to stac-utils/stac-node-validator that referenced this issue Sep 8, 2020
@m-mohr
Copy link
Collaborator

m-mohr commented Sep 8, 2020

I've implemented this in the Node Validator, seems to work so far.

@ycespb
Copy link

ycespb commented Sep 20, 2020

Hi, what if "providers" occur as property ? According to the spec this would mean that the type is a collection and not a catalog ? would be good to allow a provider for catalog also (without having to add extent information which would make it a Catalog object)...

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 20, 2020

Then it's likely that clients may not recognize it as it's undefined for catalogs. As it's the "spatio-temporal" asset catalog, collections expect related extents. Why would you not want to provide them? Catalogs are for grouping and the provider metadata schould be in the Collection or Item.

@ycespb
Copy link

ycespb commented Sep 21, 2020

Is not a problem. We can provide "extents" even though they may not be very meaningful. It looks to me that most "catalogs" will then actually become "collections", and very few "catalogs" (which are not also "Collections") will remain. Note that in DCAT, https://www.w3.org/TR/vocab-dcat-2/ the equivalent of provider can be attached to either a dcat:Catalog or dcat:Dataset and not only a dcat:Dataset.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 21, 2020

Maybe better to be discussed somewhere else (not really related to the issue), but could you guide me through your use case? It seems there's something different from how we thought it would be. Feel free to contact me through Gitter...

@cholmes
Copy link
Contributor

cholmes commented Dec 1, 2020

One thing we could do here is add itemType and set it to 'stac' in collections. This is a pattern in OGC API, and something we're considering adding at the API level - radiantearth/stac-api-spec#70 Could make sense to just set it in the core STAC collection spec.

@matthewhanson
Copy link
Collaborator

Extent is required in collections so I'm not sure an 'itemType' field is needed, it just seems like a redundant field. On the other if we want to add it to the API it makes sense here.

Wish it wasn't camelCase, I hate the fact that we keep on adding fields that don't follow the rest of the STAC convention. Guess we should have aligned to OGC API earlier on.

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 2, 2020

It doesn't add any benefit to the STAC spec itself, it's more an API thing. I think it would be okay if we say an API needs to add itemType if they implement the Features part of the API. It is only required there...

@cholmes
Copy link
Contributor

cholmes commented Dec 2, 2020

Well, I was thinking it could be used to help tools / validators to distinguish between collections and catalogs. Since we'll likely use it on the API side anyways. But I'm also fine to just keep it as something API's add.

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 2, 2020

How would that help in distinguishing between collections and catalogs? I don't really understand that...

@vincentsarago
Copy link

👋 Sorry to jump on this, but today I was asked how to differentiate Catalog/Collection/Item json files (ok it's @kylebarron who asked me), and I was surprised that I had to loop through the specs to see which keys/objects where in each one to tell what was the json file.

I'm def +1 on something like stac_type which tell the noob user like myself what file it is. If you take GeoJSON as example you know by looking at the file if it's a simple feature or a feature collection. If you just have the geometry you also know if it's a polygon, a Multipolygon .... you don't have to see the values of coordinates to know what geometry it is.

@cholmes
Copy link
Contributor

cholmes commented Jan 26, 2021

I realized that I never really came out as +1 on this. I think I was hoping for a stronger 'yes' from people doing validation. But I think we mostly heard from people who had already written their logic to detect, so didn't see a need for it.

But I think it makes sense to make it easy to figure out from one look what type of file you have.

Note we did are in the process of adding a media type #851, but I think that only helps if the link to the file uses it.

@kylebarron
Copy link
Contributor

I'm also +1. I think being able to understand the type of input data without a heuristic would be helpful for applications using STAC.

@m-mohr
Copy link
Collaborator

m-mohr commented Jan 27, 2021

I'm more on the -1 side. While it could be helpful, I feel it's too late for it and also somehow tackled by the recent best practice update which says collections are called collection.json, catalogs are called catalog.json, everything else is an item. So if you follow the best practice, it's clear what is inside the file. Also, if you want to support old versions of STAC in tooling, you still need the (simple) heuristic mentioned above. From my side, I wouldn't use it in the node validator or STAC Browser.

The heuristic can actually be simplified if you just support 0.8+, which is even be better backward compatible than stac_type and would not be much longer in code than an if-elseif-else for stac_type:

if stac_version is defined
  if type is 'Feature'
    => Item
  else if type is 'FeatureCollection'
    => ItemCollection
  else if extent is defined and/or license is defined 
    => Collection
  else
    => Catalog
else
  => Invalid

@jisantuc
Copy link
Contributor

I'm nervous about having only the heuristic to fall back on -- relying on the heuristic means that adding any entity types that are Features will break anything that relies on the heuristic. In the Franklin importer we do a kind of progressive decoding thing, where we try to decode as a collection, then fall back to decoding as a catalog if that fails. This kind of user-land tracking of more vs. less restrictive sets of fields works for now, but depending on trying decoders in the right order is a bit of a drag.

collections are called collection.json, catalogs are called catalog.json, everything else is an item

This is a difficult thing to enforce, and it's also not obvious that enforcement is possible/correct, since it's a best practice rather than a part of the spec. Also, the best practices doc is pretty long, and you'd have to know that that recommendation exists and lives in "Static and Dynamic Catalogs" in order to find it. For regular STAC producers / consumers, checking best practices might be pretty common, but if I'm firing up a python shell with pystac for the first time and mad with power STAC-ifying all the things, I'm probably not going to check to make sure I chose the correct file names.

@m-mohr
Copy link
Collaborator

m-mohr commented Jan 27, 2021

@jisantuc

relying on the heuristic means that adding any entity types that are Features will break anything that

Do these features contain the stac_version, too?

In the Franklin importer we do a kind of progressive decoding thing, where we try to decode as a collection, then fall back to decoding as a catalog if that fails.

I'm trying to understand the issue better. For the example with decoding catalogs/collections:

  • What is the advantage to check whether there's a specific stac_type in the file in contrast to check whether there's an extent property in the file?
  • How would you handle legacy STACs that are < 1.0.0-RC.1?

This is a difficult thing to enforce

Sure, that's more an additional indicator.

if I'm firing up a python shell with pystac for the first time and mad with power STAC-ifying all the things, I'm probably not going to check to make sure I chose the correct file names.

AFAIK PySTAC generates catalogs that comply to the best practice, so that should be a non-issue for PySTAC. Not sure about other tooling though.

@jisantuc
Copy link
Contributor

Do these features contain the stac_version, too?

Of course they do, but versioned heuristics just add an extra if-else to the existing heuristics.

What is the advantage to check whether there's a specific stac_type in the file in contrast to check whether there's an extent property in the file?

stac_type gives information about what the creator of the item expects it to be. If stac_type is collection, I know I only need to attempt to decode as a collection. extent being present tells you only that they included an extent. If I include an extent in an otherwise valid item (which I think I'm allowed to do) and the heuristic rules are applied in an unlucky order, my valid item will parse incorrectly. That class of error -- the presence or absence of a property doesn't mean quite what we think it does -- is impossible with stac_type.

How would you handle legacy STACs that are < 1.0.0-RC.1?

This is already I think pretty difficult. In validation it's not so bad, since you can read the schema dynamically and the concern is only the validity with the spec. In Franklin / stac4s, right now, we only support one STAC version at a time. The reason for this is that we our first concern is validity in the context of the program. I don't have a good answer for this question. I have multi-STAC-version provably internally consistent programs filed away in my head under "hard problems" and I've made no progress any time I've tried to think about it other than "well if migration tooling were magic and perfect..."

AFAIK PySTAC generates catalogs that comply to the best practice, so that should be a non-issue for PySTAC. Not sure about other tooling though.

I think the larger point is that relying on STACs to be compliant not only with a json schema but also a large best practice markdown doc is will lead to really brittle tooling. If someone follows the spec to the letter and tosses their resulting STAC through the validator only to have it rejected because they didn't give a file the right name, they're going to be pretty cranky.

@kylebarron
Copy link
Contributor

collections are called collection.json, catalogs are called catalog.json, everything else is an item

I'm working on an application intending to support STAC from any source, whether a static URL, from an API, JSON pasted in, etc. so I can't depend on file naming, and besides if it's a best practice and not part of the spec, I can't rely on it anyways.

How would you handle legacy STACs that are < 1.0.0-RC.1?

Personally I'd implement the heuristic for legacy data, but prefer a stac_type key if it exists, since I agree with @jisantuc's argument that the heuristic is less "safe" than reading a stac_type directly.

I think the larger point is that relying on STACs to be compliant not only with a json schema but also a large best practice markdown doc is will lead to really brittle tooling

Agree 💯 %

@schwehr
Copy link
Contributor Author

schwehr commented Jan 27, 2021

Some use cases off the top of my head:

  1. I would like to be able to easily detect from very simplistic code if a file is what we need. e.g. expecting a collection of some particular version (e.g. something small in java, javascript, or go). Anything that isn't the required type and version should immediately error. Only later on in processing is that file going to handled by a library that really understands stac.
  2. Be able to run an easy command across a large tree with tools like find to get all the files that meet some criteria. e.g. collection's of version > 1.0.0.
  3. If we need to write a stac library where I can't use (or it doesn't exist for a language) , I would like the option to make detecting types simpler. I might just require > 1.??? and error on anything else.

Thanks all for having this discussion!

@m-mohr
Copy link
Collaborator

m-mohr commented Jan 27, 2021

Thanks for the discussion. Many good arguments. I don't agree with all of them, but from the particpation alone (most active topic) it seems there's a demand so we should discuss it on the next STAC meeting on Monday and get a decision before RC1 (assigned the milestone now), otherwise it's likely really too late.

One thing about the itemType discussed above: We should clearly check what the use is in OGC API, but I guess it's better not to fiddle with it and just make our own, e.g. stac_type.

@m-mohr m-mohr added this to the 1.0.0-RC.1 milestone Jan 27, 2021
@cholmes
Copy link
Contributor

cholmes commented Jan 28, 2021

+1 on discussing on monday. For those who have not joined before the call is at 8am pacific time, and everyone is welcome. Just ask here / gitter / email and we can add you.

@cholmes cholmes changed the title Ability to identify STAC file types with tools like file Explicit STAC Types in JSON [was] Ability to identify STAC file types with tools like file Jan 29, 2021
@cholmes cholmes added discussion needed prio: must-have required for release associated with labels Jan 29, 2021
@cholmes
Copy link
Contributor

cholmes commented Jan 29, 2021

(added must-have and discussion needed labels so I can track this easier. That is not presupposing that the types are must have, just that we must discuss and decide this definitively before RC1)

@cholmes
Copy link
Contributor

cholmes commented Feb 1, 2021

We discussed this on the call. The consensus seemed to be to add 'type' to catalog and collection, and Items already have type=Feature (from geojson). Thus to fully figure out it's a stac item it'll be a two step approach - type=feature and then look if there's a stac_version. This also makes it so every item out there doesn't need to add a new type field. But this will still help with catalog vs collection.

Other feedback was to make this well documented, probably add a section to 'best practices' on how to distinguish stac items.

@jisantuc
Copy link
Contributor

jisantuc commented Feb 2, 2021

When crawling a catalog, you normally get a good hint that you're looking at an Item from the Item link rel type. This sounds like a sensible compromise to me.

@cholmes
Copy link
Contributor

cholmes commented Feb 5, 2021

PR is up - feedback much appreciated: #971

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 8, 2021

I just realized the solution above doesn't work: The catalog schema should succeed if I use it to validate a Collection as the Collection spec claims to inherit from the Catalog spec. It doesn't work though, as the Catalog JSON Schema would expect a type: Catalog in the Collection.

@cholmes
Copy link
Contributor

cholmes commented Feb 19, 2021

Ok, we discussed this a call today.

The latest thought is that we just remove the idea of 'inheritance' from the spec. We would no longer say 'Collection is a Catalog', and we just have type=Collection and type=Catalog. We don't try to make a JSON Schema where one comes from the other, or where we try to validate Collections as Catalogs. They are distinct things. Both are used in creating catalogs, and can be crawled interchangably by clients. But the clients need to know the expect both. We agreed that the core idea would be that Catalog and Collection share an abstract class, but there's no way to represent that, and it's not worth including that in the spec.

Implementations can choose to model things however they want. If they want to model the relationship between the two as an inheritance, they can. Or they can do a mix-in, and they may want to do a mix-in with 'links' for items too, since that construct is shared as well.

The small group on the phone did want to get some more feedback on this, if it feels like a reasonable path, and if there's anything we 'lose' by no longer saying that a collection is a catalog. We'd still want them to share all the same fields, it's mostly just changes in how we talk about it in the spec.

Thoughts? In particular we'd love to hear from people with STAC libraries / reading STAC's - @lossyrob @emmanuelmathot @kylebarron @jisantuc

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 19, 2021

Also include server implementers... Franklin, Staccato, arturo api,... :)
My preference still is to not do type and keep inheritance, but it's not a strong preference. I'm simply not sure that type will be a large benefit while being another breaking change.

@matthewhanson
Copy link
Collaborator

That's a good point too, this would be a breaking change...albeit a relatively minor one since Items are not changed.

I also don't have a strong preference on this. My main concern is do we lose anything by not having a Collection strictly be a Catalog anymore? Breaking that inheritance potentially makes things more flexible in the future, as it means Collections aren't tied to changed in Catalog anymore for future changes (not a very compelling reason since I hope we do not make such changes in the future).

@jisantuc
Copy link
Contributor

The latest thought is that we just remove the idea of 'inheritance' from the spec. ... Implementations can choose to model things however they want.

To me that's a great outcome. Decoupling collections and catalogs provides some nice future flexibility and also gets us out of the self-imposed constraints in this discussion. This also gets closer to the spec concerning itself with JSON data at rest and leaving technical details to implementers, which I think is a more correct separation of concerns.

My main concern is do we lose anything by not having a Collection strictly be a Catalog anymore?

I'm confused about what we gain from that strict relationship. We can still mix in the relevant shared JSON schema pieces, so I don't think that's a unique benefit of saying "every collection is a catalog."

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 22, 2021

One concern could also be that the types will be used to weaken the Collection field requirements. People could implement a Collection alike structure with type = Catalog. It would validate and people could expect tooling to support it while Collection requirements are not met (e.g. missing license or so). It seems like it weakens the Collection spec.

In contrast to @jisantuc, I'm actually more afraid of the decoupling as it may lead to divergence between the specs. At the moment the inheritance (seem to) simplify implementation as you can just decide to skip implementing Collections if you don't care. They are simply Catalogs itself and that was also the main reason to make them inherit, I think.

By the way, why did we drop the regexp approach? It felt a bit weird, but it doesn't feel as weird as decoupling the specs to me.

@jisantuc
Copy link
Contributor

This weekend I wound up more sure about the type field vs the heuristic because of this interaction. Because STAC-browser had to use a heuristic to guess whether the user cared about collection vs. catalog, it was stricter than necessary. Since users can add any extraneous fields that they desire, guessing about the entity type they want is always going to be fraught in this way.

At the moment the inheritance (seem to) simplify implementation as you can just decide to skip implementing Collections if you don't care.

Has anyone done this? All of stac-pydantic, pystac, and stac4s implement both collections and catalogs (the former two with inheritance, stac4s with a distinct type). What else should I check to verify whether skipping collections is something anyone has actually done? At the same time, I think you can still skip implementing collections if you don't care -- if you don't have any collections, you don't need the implementation.

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 22, 2021

Yes, the mentioned interaction lead to my comment above as I realized it could weaken the collection spec by just choosing whatever they ley from Collection and then put the type = Catalog and say/think it's valid although those additional fields actually don't get validated (maybe what Tom assumed there?). (With the old STAC version used in the example, it would not have helped anyway.)

I think most people actually implement Collections, but not sure about "internal" tooling. The reason given above was more me recalling the reason that we initially had to do inheritance. Collections were meant to gracefully work as Catalogs in the code that had not seen Collections as they were not specified yet.

@cholmes
Copy link
Contributor

cholmes commented Feb 26, 2021

A small note - going through the spec we have it so Catalog requires at least one child or item link, while Collection doesn't. Does that break the idea that a Collection is a Catalog? We should be clear there.

Also I'll try to respond more later, but I think I lean with James - the gitter interaction pushes me towards having the type field, and I'm not sure I see it practically 'weakening' the collection spec.

@emmanuelmathot
Copy link
Collaborator

Currently in DotNetStac, Collection are implemented as inherited object from Catalog. I do not mind decoupling them but I would keep a common base structure translated in a abstract class.
On the other hand, I would identify common objects across all the Stac contructs in order to keep them consistent in a single page spec. So far, I have

  • Link Object
  • Stats Object
  • Properties
    but I imagine other objects coming in the future.

@m-mohr
Copy link
Collaborator

m-mohr commented Mar 2, 2021

A small note - going through the spec we have it so Catalog requires at least one child or item link, while Collection doesn't. Does that break the idea that a Collection is a Catalog?

Good point. Yes, it does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed prio: must-have required for release associated with
Projects
None yet
Development

No branches or pull requests

10 participants