Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema in summaries #1045 #1093

Merged
merged 9 commits into from
Apr 29, 2021
Merged

JSON Schema in summaries #1045 #1093

merged 9 commits into from
Apr 29, 2021

Conversation

m-mohr
Copy link
Collaborator

@m-mohr m-mohr commented Apr 20, 2021

Related Issue(s): json-schema-org/json-schema-spec#1045

Proposed Changes:

  1. Support JSON Schema in summaries - I hadn't had a lot of time to flesh out the language, improvements appreciated.
  2. Rename Stats Object to Range Object
  3. Fix collection examples (add missing stac_extensions entries), but that fails right now due to Collection Schema stac-extensions/scientific#2

PR Checklist:

  • This PR is made against the dev branch (all proposed changes except releases should be against dev, not master).
  • This PR has no breaking changes.
  • I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.
  • This PR affects the STAC API spec, and I have NOT opened an issue/PR yet!

@m-mohr m-mohr added this to the 1.0.0-rc.3 milestone Apr 20, 2021
@m-mohr m-mohr force-pushed the json-schema-summaries branch 2 times, most recently from c38721a to e2f5c49 Compare April 20, 2021 20:01
@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 20, 2021

@jjrom Could you please have a look at this PR? I had to derive a bit from our discussion (see the issue for some context). I'm happy to discuss this again in a short call, too.

@m-mohr m-mohr force-pushed the json-schema-summaries branch from e2f5c49 to 6755d41 Compare April 20, 2021 20:19
Copy link
Contributor

@jjrom jjrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @m-mohr for this PR ! Looks good to use json-schema in summaries.

I have 2 remarks :

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 22, 2021

@jjrom Do you have an example at hand that fits into the examples folder? Could be an old one that we convert to the new schema.
We don't want too many/long examples in markdown to limit the length of the documents, so I'd put them somewhere in the examples folder and link to it, but the existing examples I also don't really want to touch, because I don't have a clue what values for counts etc to put in.

@jjrom
Copy link
Contributor

jjrom commented Apr 22, 2021

Perhaps we could duplicate the "collection.json" example (https://github.com/radiantearth/stac-spec/blob/master/examples/collection.json) to a "collection-extended-summaries.json" example in which we replace

"platform": [
      "cool_sat2",
      "cool_sat1"
]

with

"platform": {
      "type":"string",
      "oneOf":[
          {
              "const":"cool_sat1",
              "count": 103489
           },
          {
              "const":"cool_sat2",
              "count": 50700
          }
      ]
}

?

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 22, 2021

@jjrom I think we can only add it to the collection-only folder right now. The others are a full catalog, so the count would likely be 1 or 2 as there are not so many items and we also can't add two collections, because then the items would need to link to two of them.

Maybe we could duplicate, shorten and then convert https://github.com/radiantearth/stac-spec/blob/master/examples/collection-only/collection.json a bit and also add titles to epsg codes as you mentioned in the call?

@jjrom
Copy link
Contributor

jjrom commented Apr 22, 2021

@m-mohr Ok that makes sense. In this case we can duplicate https://github.com/radiantearth/stac-spec/blob/master/examples/collection-only/collection.json and update the platform property. Concerning the EPSG codes, I'm fine to add titles but only if we shorten massively the property content - for instance limits it to two EPSG codes (e.g. EPSG:4326 and EPSG:3857). Otherwise it would be hard to read

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 22, 2021

Yes, agreed! Or can you export your "best" example from your implementation, which we then try to convert to the new format? Would be a good verification that the new schema works well with your data.

@jjrom
Copy link
Contributor

jjrom commented Apr 22, 2021

I plan to update my implementation to fit the new schema next week. If it's not too late I can provide a real example by the end of next week. Otherwise we stick to the existing collection.json duplication/shortening as we discussed (?)

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 22, 2021

@jjrom I thought about an example from your current implementation. That should be easy enough to migrate "manually".

@cholmes
Copy link
Contributor

cholmes commented Apr 22, 2021

+1 on 'manual migration'. I have some time today and can try to help.

@m-mohr m-mohr marked this pull request as draft April 23, 2021 11:49
@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 23, 2021

@schwehr This looks very much like it could be useful for you as you have this gee:schema thing in your catalogs, right? Any thought on the PR? Could this be useful for you, too?

@schwehr
Copy link
Contributor

schwehr commented Apr 23, 2021

Some initial thoughts while reading to catch up... I'm not familiar enough with jsonschema. And I haven't had a chance to talk it over with simonf.

  1. Is there a way to specify the units?
  2. Looking at the example, it would be nice to have simple (even if contrived) examples of all of them (as much as jsonschema can support).
  3. At least one example without a constrained set of values for string and int.
  4. An example that used more of the capabilities of json-schema would be good
  5. We expected to have types of INT, DOUBLE, STRING, INT_LIST, DOUBLE_LIST, or STRING_LIST. I looks like we are not using the DOUBLE_LIST. The proto enum:
enum PropertyType {
  PROPERTY_TYPE_UNSPECIFIED = 0;  // Something is wrong.
  STRING = 1;
  INT = 2;
  DOUBLE = 3;
  STRING_LIST = 4;
  INT_LIST = 5;
  DOUBLE_LIST = 6;
}

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 23, 2021

The CI fails due to stac-extensions/scientific#2. Once the PR is merged, we need to release scientific as 1.0.1 and change the schema URLs here / in stac-spec.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 23, 2021

I'm wondering whether we should mark this new feature 'experimental' (with the chance of breaking in a minor release)?

@schwehr
Copy link
Contributor

schwehr commented Apr 23, 2021

Put in a suggestion for units: json-schema-org/json-schema-vocabularies#46

@cholmes
Copy link
Contributor

cholmes commented Apr 23, 2021

I'm wondering whether we should mark this new feature 'experimental' (with the chance of breaking in a minor release)?

Is there a way to make the feature an extension? Happy for it to not be namespaced and even mentioned in the main spec. But that could let us evolve it without having to cut stac-spec releases.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 26, 2021

No, it's not really extensible, I'd say. At least the extension would have the potential to break implementations and thus it's not a good extension.

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 26, 2021

Thanks, @schwehr.

  1. Is there a way to specify the units?

Not standardized, but custom keywords are a thing.

  1. Looking at the example, it would be nice to have simple (even if contrived) examples of all of them (as much as jsonschema can support).

Not sure what exactly you are asking for?

  1. At least one example without a constrained set of values for string and int.

We encourage to specify what values to expect to make it more useful for users. Just specifying that the field exists (with a specific data type?) would not be a best practice IMHO. Ideally, there would be more information.

  1. An example that used more of the capabilities of json-schema would be good

I'm not sure STAC is the right place to go into JSON Schema details. Wouldn't it be better to just link to https://json-schema.org/learn/ ?

  1. We expected to have types of INT, DOUBLE, STRING, INT_LIST, DOUBLE_LIST, or STRING_LIST. I looks like we are not using the DOUBLE_LIST.

There's no direct type value for typed arrays in JSON Schema, instead you do something like:
{ type: 'array', items: { type: 'integer' } } (INT_LIST) or { type: 'array', items: { type: 'string' } } (STRING_LIST)
The other types would be integer (INT), float (DOUBLE) and string (STRING).

Put in a suggestion for units: json-schema-org/json-schema-vocabularies#46

JSON Schema allows for custom keywords, so I'm not sure there's a need for another annotation in a validation language as you can simply define it on your own. But let's see what they think.

@cholmes cholmes mentioned this pull request Apr 26, 2021
3 tasks
@schwehr
Copy link
Contributor

schwehr commented Apr 26, 2021

Is a custom type available within the actual schema being defined inside the STAC summary? In the case I'm talking about, the data isn't what's going to have the units. The schema says what the units are going to be.

  1. Looking at the example, it would be nice to have simple (even if contrived) examples of all of them (as much as jsonschema can support).

Not sure what exactly you are asking for?

There is little sense of the flexibility of the possibilities here. I agree that the reader should go look at the json schema spec, but the examples here are extremely narrow.

  1. At least one example without a constrained set of values for string and int.

We encourage to specify what values to expect to make it more useful for users. Just specifying that the field exists (with a specific data type?) would not be a best practice IMHO. Ideally, there would be more information.

Examples of non-categorical data:

  • string: A field with the name of the person who processed the data, or a field
  • int: in the range of 0..1e6. say population in a polygon
  • float: A float between 0..1. The fraction of cloudy pixels in an image

@m-mohr
Copy link
Collaborator Author

m-mohr commented Apr 29, 2021

This seems good for now. We can fine-tune docs and examples later.

@m-mohr m-mohr merged commit 5b5a8bd into dev Apr 29, 2021
@m-mohr m-mohr deleted the json-schema-summaries branch April 29, 2021 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend summary to support list of objects / Better describe summary values
5 participants