Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] DCAT v3 support #271

Open
amercader opened this issue Apr 22, 2024 · 4 comments
Open

[META] DCAT v3 support #271

amercader opened this issue Apr 22, 2024 · 4 comments
Labels

Comments

@amercader
Copy link
Member

DCAT v3 support in CKAN

Summary

The aim is for CKAN to provide DCAT 3 support out of the box with minimal configuration, both as a metadata provider and as a consumer, providing comprehensive base implementations that can be adapted to each site needs.

Note

This is an evolving plan, it can change its shape and scope as time progresses 🦠

Primary goals

  • Support for DCAT 3 based Application Profiles (both as Provider and Consumer):
  • Provide base implementations of both profiles that site maintainers can easily adapt to their needs
  • Provide built-in support for DCAT entities not directly modelled in CKAN, like Dataset Series or Data Services (core or extension)

Secondary goals

We have also identified other areas that could be explored as part of the work on the main goals above. While it would be great to be able to focus on all them, this will depend on existing availability:

  • Create a pre-configured CKAN distribution with all necessary extension and configuration settings to run a DCAT-base site
  • Explore options for loss-less harvesting of arbitrary metadata fields for the CKAN-in-the-middle 1 use case
  • Close integration with ckanext-spatial for spatial metadata fields (spatial indexing, map previews, etc)

Functionality:

  • New processors (parsers and serializers) for DCAT-AP 3 and DCAT-US 3 to map DCAT to CKAN metadata and vice versa
  • Out of the box schemas for Datasets and Resources that support all properties defined in DCAT-AP and DCAT-US, as well as controlled vocabularies required
  • Pre-configured scheming presets and widgets for complex fields like repeating fields, date ranges, etc
  • Finished and documented support for multilingual DCAT fields
  • UI and underlying API actions to manage new entities (e.g. for Dataset Series: defining the dataset, adding items to the series, etc)

Use cases:

  • Publishers that are required to publish their metadata following DCAT-AP 3 or DCAT-US 3 can use CKAN out of the box
  • CKAN sites that need to adhere to national or regional variations can easily modify the base profiles and schemas to adapt them to their needs
  • Portals that aggregate remote metadata in CKAN can harvest or import different sources that follow DCAT 3 based standards (alongside previous versions of DCAT and other formats)

Work items

Note

These need to be split and expanded in individual issues

Getting ready

  • New version of ckanext-dcat that includes updated requirements, support for CKAN 2.11 and if possible multilingual support. This will be the base for all DCAT 3 support
  • Scheming implementation of the currently supported DCAT-AP 2.x (#56)

Core Dataset and Distribution properties

  • Adapt parsers to consume/expose new fields for Datasets and Resources (Distributions) in DCAT-AP 3
  • Create schemas for DCAT-AP 3 + UI widgets (repeating, etc)
  • Integrate SHACL validation to prove compliance (DCAT-AP)
  • Repeat process for DCAT-US

Dataset Series

  • Create a specification and design the feature
  • Implement it at the CKAN level
  • Expose it / consume it as DCAT metadata

Data Service

  • Create a specification and design the feature
  • Implement it at the CKAN level
  • Expose it / consume it as DCAT metadata
    Other

Previous CKAN - DCAT 3 discussions

Footnotes

  1. Describes a scenario where CKAN harvests metadata from a lower level site, and in turn it exposes its metadata to a higher level aggregator portal

@amercader amercader moved this to Todo in DCAT v3 support Apr 22, 2024
@amercader amercader changed the title DCAT v3 support [META] DCAT v3 support Apr 22, 2024
@ivbeg
Copy link

ivbeg commented May 16, 2024

Hi! I represent CKAN and other data catalogs aggregator Dateno (dateno.io) and several CKAN based data catalogues.
My thoughts about DCAT 3 implementation:

  1. DCAT 2 and/or 3 should be enabled by default. It's a good practice to provide bulk metadata access as DCAT instead of CKAN regular API. I would say that it's much easier to collect metadata from DCAT-first data catalogs.
  2. There are some variations allowed by DCAT 2 and 3 about dataset theme and some other fields. It would be great if CKAN team could provide global reference metadata or if it could be possible to configure dictionaries during CKAN configuration.
  3. Looking further it would be great if CKAN will be much more customizable not just ot support DCAT 3, but other metadata standards as well.
  4. DCAT 3 introduces several important fields related to dataset version management. I am sure that most CKAN users don't aware much about it. I think that some educational resources needed about that.
  5. DCAT 3 doesn't break DCAT 2. Please keep providing export of CKAN data with both DCAT 2 and DCAT 3 standards. It will not break existing parsers.

@amercader
Copy link
Member Author

amercader commented Jun 6, 2024

Hi @ivbeg , sorry I missed your comments, see below:

DCAT 2 and/or 3 should be enabled by default. It's a good practice to provide bulk metadata access as DCAT instead of CKAN regular API. I would say that it's much easier to collect metadata from DCAT-first data catalogs.

There are technical and maintenance reasons to not want to pull all this extension into CKAN core to have it always enabled it by default, but I agree with the sentiment that out of the box minimal DCAT support would be useful

There are some variations allowed by DCAT 2 and 3 about dataset theme and some other fields. It would be great if CKAN team could provide global reference metadata or if it could be possible to configure dictionaries during CKAN configuration.

Yes, we want to explore providing codelists on the extension itself

Looking further it would be great if CKAN will be much more customizable not just ot support DCAT 3, but other metadata standards as well.

The foundation for this is already in place with ckanext-scheming, you can implement any metadata schema in CKAN. The next step is to provide out of the box support for specific standards like the dcat extension aims to do, which is something that of course can be applied to other standards

DCAT 3 doesn't break DCAT 2. Please keep providing export of CKAN data with both DCAT 2 and DCAT 3 standards. It will not break existing parsers.

Absolutely, all existing parsers will remain unchanged

Thanks for your thougths

@riccardoAlbertoni
Copy link

The features added by DCAT2 to DCAT, along with the rationale behind its main revisions based on collected use cases and requirements, are discussed in [1]. Considering this work can provide valuable insights into the principles guiding the evolution of DCAT, thereby aiding in the implementation of DCAT support. Also, the same rationale and assumptions have driven the development of DCAT3.

As for DCAT3, it introduces the following key features:

  • Dataset versioning
  • Dataset series
  • Checksums
    Every DCAT release is built upon the previous, intending to provide new features but not alienating existing implementations.

If you can point me to the current CKAN DCAT mapping, it would be useful as a base for further assistance.

[1] Riccardo Albertoni, David Browning, Simon Cox, Alejandra N. Gonzalez-Beltran, Andrea Perego, Peter Winstanley; The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake. Data Intelligence 2023; doi: https://doi.org/10.1162/dint_a_00241

@amercader
Copy link
Member Author

Thanks @riccardoAlbertoni I'll read the linked article for sure. The extension documentation including the DCAT - CKAN mapping will be reworked but in the meantime you can check the current version here:

https://github.com/ckan/ckanext-dcat?tab=readme-ov-file#rdf-dcat-to-ckan-dataset-mapping

The CKAN schema used to implement DCAT 2.1 might also be of interest:

https://github.com/ckan/ckanext-dcat/blob/master/ckanext/dcat/schemas/dcat_ap_2.1_full.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Todo
Development

No branches or pull requests

3 participants