Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique ids #883

Merged
merged 7 commits into from
Feb 25, 2021
Merged

Unique ids #883

merged 7 commits into from
Feb 25, 2021

Conversation

cholmes
Copy link
Contributor

@cholmes cholmes commented Aug 20, 2020

Related Issue(s): #1011 #822

Proposed Changes:

  1. Added more information to make sure ID's are unique within a collection.
  2. Added h4's to the item spec.

PR Checklist:

@cholmes
Copy link
Contributor Author

cholmes commented Aug 20, 2020

Just put this up for discussion. Wondering if we should be more explicit about what makes a ID globally unique? Like is it collection ID plus the URL of one of the provider roles? Should that role be producer? Or host? Host is the one where we say explicitly there should just be one, so perhaps that makes the most sense.

@lossyrob
Copy link
Collaborator

I think these constraints would solve this:

  • Item IDs are unique within their collection. If there is no collection, item IDs must be unique within their parent catalog.
  • Catalog and Collection IDs must be unique within the root catalog (ideally globally unique everywhere)

I'd advocate for not having to parse out Provider information in order to have collection IDs unique; providers aren't required and it would make the constraint more simple across catalogs and collections if the ID was required to have all information necessary for it to be globally unique (i.e. the STAC creator can insert the provider name into the IDs if that's needed to make them unique).

@pomadchin
Copy link

pomadchin commented Aug 20, 2020

How would it align with radiantearth/stac-api-spec#36?
What would happen if IDs of the aggregated items across collections would match? Is it fine?
The result of such an aggregation query would be a collection of items that won't have unique IDs within the collection?

/cc @m-mohr @matthewhanson

P.S. I know that it is a bit unclear what would happen with the Aggregation Extension, but still decided to ask this question.

@lossyrob
Copy link
Collaborator

@pomadchin I'm unfamiliar with the aggregation extension, but from a read through it seems like it returns information about a range of items and not the items specifically - is that right? Or are you saying that the aggregation would return an actual Collection, with the full items?

Would this work if the aggregation extension, if having to return references to individual items, would be required to identify the items by both their collection ID and item ID?

@pomadchin
Copy link

@lossyrob It is a bit unclear (since that was just an oral conversation), but from what I understand it can be an actual Collection with links to items.

It would definitely work if items would be identified by both collection ID and item ID in a such collection. 👍

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 20, 2020

I'm fine with @lossyrob proposal, except that I would say:

  • Catalog and Collection IDs must be unique within the root parent catalog

Reason is that if you combine other catalogs/collection in new catalogs, then you can't always garantuee the uniqueness. Also, what is actually the root catalog? If there are two independant catalogs and I link to them from a new catalog, what is then the root?

Re aggregations: I understood that aggregations could be Collections, but I don't think they should duplicate IDs. They are somewhat "virtual" anyway.

@lossyrob
Copy link
Collaborator

Catalog and Collection IDs must be unique within the root parent catalog

I don't think this is restrictive enough. The benefit of having globally unique collections and catalogs is that they can always be referenced by their IDs, and items can be globally referenced by their collection_id/item_id combination, which was the desired outcome. If we scope it only to parent IDs, then this breaks pretty easily; if I were to have a catalog that had sub-catalogs based on what month the item was captured, and had several catalogs containing that same catalog structure, there would potentially be many instances of a "march/{item_id}" identifier for items. I think there's other that this would break the collection ID uniqueness that @matthewhanson was advocating for from an API perspective, though he would be better to speak to that.

@m-mohr
Copy link
Collaborator

m-mohr commented Sep 2, 2020

Good point regarding APIs. But then I'd say they need to be unique for the parent collection, which is usually different to the root catalog. That works for APIs and makes combining several collections into whatever catalogs easier. You just can't always guarantee uniqueness if you combine different sources, like Matt does with Earth Search or I do with STAC Index (which itself are catalogs again). If there's no collection, then it seems it must be the root catalog though.

@cholmes
Copy link
Contributor Author

cholmes commented Feb 24, 2021

Picking this up after a long delay... And I'm confused as to who is advocating what.

The core recommendation seems to be id's should be unique for the parent collection. And we strongly recommend that. If items don't have a parent collection then it seems like they do need to be unique in the 'root'.

There is the case where some meta catalog wants to include a catalog that doesn't have collections, and thus can't guarantee that the id is unique. I'd say for that we just warn people that if they define a items without collections then their catalogs just won't be nearly as used.

@cholmes
Copy link
Contributor Author

cholmes commented Feb 24, 2021

Ah, now I see that I was confused, was mixing up collections and items.

I just committed an attempt at this. Basically said that id's in a collection need to be unique, and that collection id's should aim to be globally unique. And handled the 'no collection' use case by saying that Items should attempt to make their id's globally unique (and remind people we strongly recommend a catalog).


In general, STAC versions can be mixed, but please keep the [recommended best practices](../best-practices.md#mixing-stac-versions) in mind.

#### id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd align with the order in the table and move this below the stac_extensions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cholmes I guess you've not seen this?

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 24, 2021

I'm fine with unique Item ids in a collection.
I'm also fine with items having a unique id up until the root catalog they define if there's no collection.

I have issues with the "collection id" must be globally unique. I see that collection id's should be unique across a provider, but globally uniqueness no one can really guarantee. I don't think there will be broad adoption, especially as the PR doesn't say why this is useful. Additionally, most people have their collection IDs already defined before adopting STAC and I don't think many people will change them (at least I don't see anyone in openEO doing it, I doubt GEE would do that - the obvious reason is that we use them in processing workflows and different IDs just in STAC would break things and confuse users).
Lastly, what happens if I mirror a collection or so? The collection-id will not change but now there are two collections with the same id. So this sounds good in theory, but I think in practice this will fail to work.

item-spec/item-spec.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggested changes, otherwise +1

@cholmes
Copy link
Contributor Author

cholmes commented Feb 25, 2021

Ok, merging this. Committed Rob's changes, as Matt, Rob and Matthias all agree on not having the 'globally unique' stuff for collections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add language to help ensure globally unique ID's
5 participants