-
Notifications
You must be signed in to change notification settings - Fork 0
T7.3 Data Management in CKAN HOW TO
CLARITY as H2020 project is obliged to provide a continuously updated Data Management Plan that describes what data the project will use and produce, whether and how data produced will be exploited or made (openly) accessible for verification and re-use and how these data will be curated and preserved after the end of the project. Open Data must be put into a public repository, e.g. Zenodo. The repository should be OpenAIRE-compliant to enable harvesting of metadata.
For a brief overview, please have a look at the following presentations:
A project-wide data management policy that handles these issues on administrative and technical level has been established and described in CLARITY deliverable D7.8 Data Management Plan v1. This includes for example topics like data and meta-data collection, personal data treatment, the data repository infrastructure and the mandatory compliance to the Open Access Infrastructure for Research in Europe (OpenAIRE).
Data management activities concern the whole project and need to be coordinated and monitored both at project and work package level. Data management is also linked to publication of project results and thus dissemination activities. Therefore, the following roles and responsibilities can be identified:
The Project Data Manager (T7.3 task leader) is responsible for
- developing the data management plan and policy in cooperation with the project management in WP7 and the technical partners
- coordinating the technical realisation of CLARITY's living DMP by means of a customised CKAN meta-data catalogue
- maintaining the technical and organisational infrastructure of the CKAN metadata catalogue
- developing a user guide for the usage of CLARITY’s living DMP
- monitoring data management activities (both collection and publication) and deadlines and sending reminders to WP data managers
- providing support to WP data managers
- coordinating the writing of the DMP deliverable documents (D7.x)
- providing solutions for specific issues in accordance with project management
The Workpackage Data Managers are responsible for
- the implementation of the data management policy in their respective WPs
- monitoring data management activities and deadlines and sending reminders to partners
- offering customized help and further guidance for using CLARITY’s living DMP
- asking partners for missing information or clarifications
- providing input to the DMP deliverable documents (D7.x) by analysing and summarising the WP-specific datasets listed in CLARITY’s living DMP
- offering customized help and further guidance for publishing open data and open source software
- monitoring that open results (data and software) are deposited in the default repository or a complementary OpenAIRE-compliant repository and sending reminders to partners
- monitoring that open results available in OpenAIRE are properly linked with CLARITY
- contacting the quality assurance and ethics committee in case of questions and ethical and privacy issues that may forbid a publication of the data
- ensuring that the meta-data of data used and produced at workpackage-level is made available in CLARITY’s living DMP according to the CLARITY data management policy and guidelines in a timely manner.
The Dissemination Manager is responsible for
- offering assistance in choosing the right publication path (green or gold open access)
- offering customized help and further guidance for publishing scientific publications
- ensuring that the open access policy of the journal complies with the H2020 open data requirements before the researcher submits a manuscript
- monitoring that green access (self-archiving) publications are deposited in repositories and sending reminders to partners
- monitoring that metadata about publications is made available in the R&I Participant Portal (preferably automatically through OpenAIRE) and on the CLARITY website
- monitoring that research data related to a publication is made available in repositories and linked to respective publication
- monitoring possible embargo periods and sending reminders to partners
- monitoring that publications available in OpenAIRE are properly linked with CLARITY
The CLARITY Data Management Plan (DMP) follows the structure of the Horizon 2020 DMP template. It reflects the status of the data that is collected, processed or generated by the project and, whether and how this data will be shared and/or made open, and how it will be curated and preserved. The initial Data Management Plan (DMP) was the first deliverable of Task 7.3 “Data Management” and as public document it has been made available on Zenodo. For the second version of CLARITY's Data Management Plan, the consortium has decided to implement the DMP as a "living document" by means of CLARITY's CKAN Catalogue. The actual deliverable documents (D7.9 and D7.10) are brief reports that summarise the contents of the CKAN Catalogue.
CLARITY's CKAN Catalogue is available at ckan.myclimateservice.eu. It represents both the CLARITY deliverables D2.2 "Catalogue of local data sources and sample datasets" and D7.x Data Management Plan v1 - v3. Thereby the datasets registered in CKAN catalogue in the course of the preparation of D2.2 are a subset of the datasets that need to be registered in the course of the preparation of the D7.x. deliverables. Besides the instructions for transferring meta-data collected by Task T2.2 "Demonstrator-specific data collection" to CKAN, there are a few more requirements to be considered.
First, depending on whether the data is used or produced, or open-data or non-open data and the WP or DC the dataset is relevant for, the following tags must be assigned to the dataset:
- WP1, WP2, WP3, WP4, WP5, WP6
- DC1, DC2, DC3, DC4
- input-data or output-data
- open-data
Second, the dataset has to be assigned to one of the four groups defined in the CKAN Catalogue.
This part of the CLARITY Data Management Plan reports on Non-Open Data used by the CLARITY project. The respective datasets in CKAN are assigned the tag input data (in addition to the respective DC and WP tags) and associated with the group Non-Open Data used by CLARITY.
Such data is made available or sold by the data provider under a restricted license. It may encompass for example data made available by local authorities (e.g comune di napoli) just for the purpose to serve as input for (climate) models in the CSIS (e.g. confidential information on urban planning, population distribution, etc. ).
While the respective meta-data can be published in CLARITY's CKAN, the actual data is stored on the internal CLARITY sFTP server but not meant to be shared outside of the CLARITY consortium. Thus, the respective Resources (data) associated with the Dataset (meta-data) in CKAN are in most cases links to data files on CLARITY's access-controlled sFTP.
This part of the CLARITY Data Management Plan reports Open Data used by the CLARITY project. The respective datasets in CKAN are assigned the tags input data and open data (in addition to the respective DC and WP tags) and are added to the group Open Data used by CLARITY in CKAN.
Such datasets may encompass datasets that are collected from public authorities and institutions like Eurostat or Copernicus and that are released under a open license that allows to use the data for research, non-commercial or commercial purposes.
While CLARITY's CKAN Catalogue provides some meta-data related to the usage of the data in CLARITY, the actual data can in general be downloaded directly from the websites of respective organisations. Although some of the data is also stored on the CLARITY sFTP for the purpose of further processing and visualisation in the CSIS, the respective Resource (data) associated with the Dataset (meta-data) shall not point to CLARITY's internal sFTP but to the original source of the dataset.
Example: Open Data Resource Digital Elevation Model over Europe with data link to eea.europa.eu.
This part of the CLARITY Data Management Plan reports Non-Open Data produced by the CLARITY project. The respective datasets in CKAN are assigned the output data tag (in addition to the respective DC and WP tags) and associated with the group Non-Open Data produced by CLARITY.
According H2020 Data Management Obligations data produced by the project should be open by default. However, if one of the following general exceptions forbids open access to certain datasets produced by the project, the datasets can be released under a restricted license:
-
copyright and permissions for reusing third-party data sets
Processing and combining input data from many different sources may lead to unclear IPR situations regarding the generated output data, therefore such repurposed data (e.g. model output data) can only be made open if any of the underlying data (e.g. model input data) is open, too. -
personal data treatment and confidentiality issues
Datasets referring to the quality and quantity of certain elements at risk, such as people and critical infrastructures, are not open by default as their publication may pose privacy, ethical or security risks. -
data-driven business model
Data that is exploited commercially through the MyClimateService.eu marketplace will not be made open. -
user-generated content
Data related to individual adaptation scenarios (e.g. adaptation options, performance indicators, criteria, etc.) that is generated by (external) end users during the usage of CRISMA climate services, is only be made open with explicit permission from the end user. -
other restrictions
If such restrictions exist that prevent the provider / producer of the data to release it as Open Data, the reasons for not doing so have to be clearly articulated in the description of Datasets and summarised in the deliverable documents.
While the respective meta-data has to be published in CLARITY's CKAN, the actual data may be stored on the internal CLARITY sFTP server and is not meant to be shared outside of the CLARITY consortium. Thus, the respective Resources (data) associated with the Dataset (meta-data) in CKAN may contain links to data files on CLARITY's access-controlled sFTP.
This part of the CLARITY Data Management Plan reports Open Data produced by the CLARITY project. The respective datasets in CKAN are assigned the tags output data and open data (in addition to the respective DC and WP tags) and associated with the group Open Data used by CLARITY.
CLARITY open results are made accessible according to the Rules on Open Access to Scientific Publications and Open Access to Research Data in Horizon 2020. All open results (data, software, scientific publications) of the project have to be openly accessible at an appropriate Open Access repository. Specifically, research data needed to validate the results in the scientific publications has to be deposited in a data repository at the same time as a publication. The main intention data management plan is to ensure that such open data produced by EU-funded projects are deposited in a respective repository and thus are usable by third parties after the end of the project.
However, if confidentiality, security, personal data protection obligations or IPR issues forbid open access to certain data produced by the project, it is deposited in a restricted repository and access may be granted upon request and under the conditions of a restricted license. Such data produced by the project that cannot be released as open data is listed in the category Non-Open Data produced by CLARITY together with an explanation of the reasons that forbid open access.
Since the CLARITY sFTP is not public accessible nor meant for long term archival and preservation of data beyond the lifetime of the project, open data produced by CLARITY should be uploaded to an OpenAIRE-compliant repository like Zenodo.
The following diagrams describe the general data management workflow for adding Datasets (meta-data) and linking Resources (data) in CLARITY's CKAN.
For data management purposes, it is not sufficient to link to (open) data stored on CLARITY's internal sFTP server as explained here. In that case, an additional Resource has to be added to the Dataset that links to the public accessible original data.