-
Notifications
You must be signed in to change notification settings - Fork 592
Align with DCAT & Clarify how to specify which version of the schema is being used #309
Comments
Allow an array for different schema urls, if arrays are allowed. The benefit of allowing an array, is that we add additional schema fields. |
so the catalog might include entries that use different schema? Does each record identify which schema it uses? |
If versioning transitioned from
We'd also need to document pretty well what this looks like and how agencies would migrate to it. |
One proposal is to use the same property names used for this purpose in JSON Schema ( In this context Here's an excerpt of this in a data.json file, but you can see a full example in this gist
|
@jpmckinney any feedback on this as opposed to JSON-LD? #23 (comment) At the very least it seems like we've been making progress on DCAT compatibility so that it should a little easier to provide a JSON-LD serialized variant |
You can do both. Popolo is an RDF vocabulary, and it has both JSON Schema and JSON-LD contexts available: for example, Person schema and Person context. I haven't see |
@jpmckinney Perhaps this proposal isn't using I saw there were some somewhat related questions (1, 2) about this on the JSON Schema mailing list but I wasn't sure if there was more recent thinking or if hyper-schema addresses this in a way that would be relevant, so I just brought it up again on the JSON Schema mailing list. |
On second thought, I think the issue is that JSON Schema doesn't specify any way for an instance to refer to its schema, and that |
@jpmckinney looks like you can do it with HTTP headers using |
Sounds good, though I'm not clear on the difference between |
One is meant to be a URI that identifies the standard ( Here's an updated example:
|
To respond to @BernHyland's recommendation to use JSON-LD to accommodate this, which I too had questioned, which @jpmckinney responded saying both approaches were possible, and whereas there has already been long discussions about whether/how to incorporate linked data, and partial examples of JSON-LD, I wanted to at least see what a minimal JSON-LD representation of the current proposal would be to better inform that discussion. Here is a first attempt at trying to JSON-LD-ify (to the simplest extent possible) the JSON example of the current schema. This may very well be an imperfect serialization so I'm putting it here both to get feedback on what a minimally viable JSON-LD serialization would be and to get feedback on how this compares to a JSON serialization of the schema without JSON-LD requirements. I don't want this particular issue to turn into a broader debate about linked data as #21 did, but we can create a new issue for that as needed. I'd rather this focus on how JSON-LD can help address the specific need raised in this issue. Also note that data.gov, which aggregates this Project Open Data metadata from across the Federal government, already offers linked data in the form of the RDF XML serialization of DCAT provided by CKAN as well as the Schema.org microdata/RDFa mapping of DCAT. cc: @gkellogg @amercader @philarcher1 {
"@context": {
"dcat": "http://www.w3.org/ns/dcat",
"org": "http://www.w3.org/ns/org",
"vcard": "http://www.w3.org/2006/vcard/ns"
},
"@id": "http://www.agency.gov/data.json",
"@type": "dcat:Catalog",
"conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
"describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
"dataset": [
{
"@type": "dcat:Dataset",
"accessLevel": "public",
"accrualPeriodicity": "R/P1Y",
"bureauCode": [
"018:10"
],
"conformsTo": "http://www.agency.gov/widget-taxonomy/",
"contactPoint": {
"@type": "vcard:Contact",
"fn": "Jane Doe",
"hasEmail": "mailto:[email protected]"
},
"describedBy": "http://www.agency.gov/datasets/widgets-dictionary.html",
"dataQuality": true,
"description": "This dataset provides national statistics on the production of widgets",
"distribution": [
{
"@type": "dcat:Distribution",
"description": "Widgets data as a CSV file",
"downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets.csv",
"format": "CSV",
"mediaType": "text/csv",
"title": "widgets.csv"
},
{
"@type": "dcat:Distribution",
"description": "Widgets data as a zipped CSV file with attached data dictionary",
"downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets-all.zip",
"format": "Zipped CSV",
"mediaType": "application/zip",
"title": "widgets-all.zip"
},
{
"@type": "dcat:Distribution",
"conformsTo": "http://www.agency.gov/widget-data-standard/",
"describedBy": "http://www.agency.gov/widgets/schema.json",
"describedByType": "application/schema+json",
"description": "Widget data as a JSON feed",
"downloadURL": "http://www.agency.gov/feeds/widgets-all.json",
"format": "JSON",
"mediaType": "application/json",
"title": "widgets-all.json"
},
{
"@type": "dcat:Distribution",
"accessURL": "https://data.agency.gov/api/widgets-statistics/",
"description": "A fully queryable REST API with JSON and XML output",
"format": "API",
"title": "Widgets REST API"
}
],
"identifier": "widgets-0001",
"issued": "2011-11-22",
"keyword": [
"widget",
"manufacturing",
"factory"
],
"landingPage": "http://agency.gov/widgets/data",
"language": [
"en-US"
],
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"modified": "2011-11-19T12:00:00Z",
"primaryITInvestmentUII": "021-006227212",
"programCode": [
"018:001"
],
"publisher": {
"@type": "org:Organization",
"name": "Widget Services",
"subOrganizationOf": {
"@type": "org:Organization",
"name": "Office of Citizen Services and Innovative Technologies",
"subOrganizationOf": {
"@type": "org:Organization",
"name": "General Services Administration",
"subOrganizationOf": {
"@type": "org:Organization",
"name": "U.S. Government"
}
}
}
},
"references": [
"http://agency.gov/docs/widgets-1.html",
"http://agency.gov/docs/widgets-2.html"
],
"rights": "This dataset has been given an international public domain dedication for worldwide reuse",
"spatial": "United States",
"systemOfRecords": "http://www.agency.gov/widgets/sorn/",
"temporal": "2009-09-01T12:00:00Z/2010-05-31T12:00:00Z",
"theme": [
"manufacturing"
],
"title": "U.S. Widget Manufacturing Statistics"
}
]
} |
Thank you for the name check, Philip. As it happens I'll see Gregg Kellogg this week (there's a big W3C get Phil. On 24/10/2014 22:16, Philip Ashlock wrote:
Phil Archer http://philarcher.org |
@philipashlock Thanks for the pointer to this. I'm not familiar with the full context of this discussion (I'll try and check the links) but to me the JSON-LD approach if what is needed is bridging the convenience of a plain JSON based metadata shcema and the benefits of Linked Data. In the context of CKAN, we've been recently working on expanding support for importing RDF serializations based on DCAT, but we've focused on more traditional formats like RDF/XML and Turtle. I'm really interested to see how well we can support JSON-LD as well, both for ingesting it and producing it so this is hugely helpful. I'm by no means a JSON-LD expert but at a first glance I think that the {
"@context" : "http://data.gov/contexts/catalog.jsonld"
"@type": "Catalog",
"title": "Some Catalog",
"datasets": [
"@context" : "http://data.gov/contexts/dataset.jsonld",
"@type": "Dataset",
//...
]
//...
} http://data.gov/contexts/catalog.jsonld being: {
"Catalog": "http://www.w3.org/ns/dcat#Catalog",
"dataset": "http://www.w3.org/ns/dcat#dataset",
"title": "http://purl.org/dc/terms/title",
"description": "http://purl.org/dc/terms/description",
//...
} I'll try to come up with a full example soon, as this is something I wanted to add as part of the work in progress on ckanext-dcat
Here is some further CKAN specific discussion about how these could be improved. I'll feed back any news on the DCAT/RDF front in CKAN if relevant to this thread. |
@philipashlock, I'm confused by one aspect of this—it seems like the labels are reversed from the logical choices. It seems like |
@waldoj perhaps my choices are based more on precedents for using these terms than what you would expect by seeing them here for the first time, but I think there's also a subtle distinction between what we mean when we refer to the Project Open Data schema and the schema as represented with JSON Schema. First on precedents for the terms. The precedent for
The precedent for
As far as the distinction between the Project Open Data schema and the schema as represented with JSON Schema: The schema is something that could be described with other formats other than JSON Schema or the main website where it's documented, so instead of referring to it by a particular serialization, we're using the main URL as a unique identifier for it. The JSON Schema file isn't "the schema" or "the spec" it's just one way of describing it in a machine readable way. In other words, |
Huh, I see the sense in picking up those those existing practices, but the result sure is confusing to me! (FWIW, that JSON Schema draft proposal expired over a year ago, and appears to have been abandoned by its creators.) Well, |
@waldoj FWIW, The use of this term might be less confusing where it's used in the dataset context where it replaces the previous term Think of |
@waldoj in case it helps, you got it right to begin with, so sorry I didn't make that more clear.
That's correct and that's how they're being used. |
Now I'm confused again. :) So |
The fields are actually used in three classes: the Catalog, the Dataset, and the Distribution. In all cases, they have this general meaning:
In the Dataset and Distribution class, I think part of the confusion is that we're referring to this standard with the word "schema" and one of the ways we're describing it is with something called JSON Schema. It might be more clear if we swapped out the word "schema" in your original statement with "standard" like so:
In the Catalog class, |
Ah-ha! Yes, now everything has snapped into place. So they do mean the thing that I initially thought that they should mean, and all is right in the world. :) You're right, my use of the word "schema" was confusing everything. Thanks for your patience, @philipashlock. :) |
Example is based on @philipashlock's one in project-open-data/project-open-data.github.io#309
@philarcher1 and I worked on the JSON-LD representation described at the top of the issue a little bit at TPAC. Here's a link to the JSON-LD playground that starts to improve the context and show how it actually expands out: http://tinyurl.com/q6q3zjo. That's probably the best place to experiment with JSON-LD markup and transformations. Copied here below:
|
@gkellogg @philarcher1 I also started working on expanding the JSON-LD example, any feedback is welcome: Playground link: http://goo.gl/SrDyyH |
Just a couple of notes:
|
@gkellogg thanks a lot for this, it is incredible feedback. I've created a new issue in our repo to stop hijacking this thread, I'll follow up with the changes there: ckan/ckanext-dcat#20 |
Thanks Gregg (and Phil), that is really helpful guidance and provides the needed guidance on JSON-LD. Looks like the W3C TPAC was a highly productive & busy!! Cheers, Bernadette Hyland On Oct 29, 2014, at 4:20 PM, Gregg Kellogg [email protected] wrote:
|
Following on from what Gregg and I sat and did quickly the other day, I Some comments: First of all, again, let me stress how terrific it is to see this. The The first thing I've done is to split out the various namespaces that Datesxsd:dateTime is rarely needed in open data. It's useful for things like I don't know about access levels, bureau and program codes, and data I haven't touched the data itself, just the context section. I notice Finally, this is all prime input to current work at W3C on Data on the I'll follow up separately, Phil, on setting up a call. HTH for now - happy to continue to work on this if helpful. Phil. For ease of reference, the context and data are copied below: { On 31/10/2014 16:44, Bernadette Hyland wrote:
Phil Archer http://philarcher.org |
Thanks you so much @philarcher1, @gkellogg, @amercader I would then suggest we collapse this {
"@context": "https://project-open-data.cio.gov/v1.1/schema/data.jsonld",
"@id": "http://www.agency.gov/data.json",
"@type": "dcat:Catalog",
"conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
"describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
"dataset": [
{...}
]
} Would that be ok? As you've noted, Also, just to check, is it ok to have multiple |
@philipashlock Thank you for really digging into this issue and issue #362. Having @gkellogg and @philarcher1 commenting on the thread has helped to highlight this importance of shared namespace and provision of necessary code samples. Thanks guys. @philipashlock - What you've proposed above in terms of collapsing @context may not be complete. I believe the example @philarcher1 gave (21 hours ago) that provides the canonical namespace URL may be required however, please double check the W3C spec (Recommendation) specifically in relation to @context, see http://www.w3.org/TR/json-ld/#the-context. Ultimately, @gkellogg is best positioned as one of the main W3C document editors to provide a final thumbs up or additional details to get this code sample / documentation right. What you outlined about the other terms that you're adding to the Project Open Data Metadata Schema v1.1 look fine AFAIK. Thanks @gkellogg & @philarcher1 for your guidance. @philipashlock, thanks very much for taking onboard this feedback & working hard to get it right. Major kudos to the small but mighty team at GSA who are getting a lot done to advance the US Gov't open data effort. NB: to those monitoring this thread -- Why does this matter? IMHO, it's critical to get this Project Open Data guidance specified relevant to W3C standards for data publishing is because many US Gov't Agencies / Departments will be held accountable for publishing using the proposed Metadata Schema v1.1 that @philipashlock & team are spearheading. If they don't publish in conformance to this schema the penalties may include reduced funding. So this is not an intellectual exercise, it is very real & important for the ongoing future of US Government Open Data. |
Thanks Bernadette, Sorry I've been swamped today with one thing or another and so haven't What I did yesterday will, I'm sure, need improvement, and Bernadette is But the basics are clear: a canonical @context file that everyone can Take Spain as an example. Spain comprises many proud regions (Basque So we're on the right road with excellent prospects but please don't Phil. On 04/11/2014 16:57, Bernadette Hyland wrote:
Phil Archer http://philarcher.org |
Thanks again for all the feedback. At first I thought it would make sense to discuss JSON-LD with this issue as an alternative to the initial proposal, but I think there's value in doing both so I'd like to move the JSON-LD discussion over to #388 and continue to move forward with the existing proposal to use |
Makes sense. Totally agree with integrating this but don't see it as a hard requirement for #357. |
Just as an update here, we've stuck with |
Thank you for driving the conversation around this issue and helping to assemble the v1.1 metadata update. There appears to be strong consensus around this issue, which has been accepted in the v1.1 update and merged into Project Open Data. Project Open Data is a living project though. Please continue any conversations around how the schema can be improved with new issues and pull requests! It's important for government staff as well as the public to continue to collaborate to make the Open Data Policy ever better. Though the v1.1 update is a substantial update, future iterations do not have to be, so whatever your ideas - big or small - please continue to work with this community to improve how government manages and opens its data. |
Currently we just have this section on http://project-open-data.github.io/catalog/
Also see #79
The text was updated successfully, but these errors were encountered: