-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New HealthDCAT-AP profile #326
Merged
+1,825
−10
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
dfbf750
Update tests for changes in date parsing in rdflib
amercader 1e7314c
Fix agent mbox value to be also without mailto
Markus92 7ba4777
Initial HealthDCAT-AP profile
Markus92 86b85d2
Initial passing unit tests for example dataset
Markus92 fb6ecc5
More fields and more tests
Markus92 899ac2c
Additional HealthDCAT-AP fields
Markus92 221c002
Fix Wikidata URIs in example so they actually resolve
Markus92 f5b7216
Add coding system attribute
Markus92 73e4c88
Create initial CKAN JSON data implementing HealthDCAT scheme
Markus92 4e78c47
Add a whole bunch of test cases
Markus92 e4bcca7
Implemented code values, qualified relations and analytics
Markus92 d90a4c8
Add URL property to contactPoint (VCARD.hasURL)
Markus92 4458afe
Wrote some documentation regarding the extension
Markus92 d9818fa
Merge branch 'master' into healthdcat_ap
Markus92 3f8cd85
dpv:hasPersonalData and some cleanup
Markus92 4422323
Small documentation update
Markus92 5a6d402
Merge remote-tracking branch 'upstream/master' into healthdcat_ap
Markus92 655370b
Fix cardinality of qualified_relation
Markus92 1140071
Fix test case for spatial_coverage in HealthDCAT-AP profile
Markus92 fa18da6
Merge remote-tracking branch 'origin/healthdcat_ap' into healthdcat_ap
Markus92 9966dd9
Small cleanup
Markus92 30f3bac
Move qualified relations to generic CKAN DCAT scheming class
Markus92 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
from rdflib import RDF, SKOS, XSD, BNode, Literal, URIRef, term | ||
from rdflib.namespace import Namespace | ||
|
||
from ckanext.dcat.profiles.base import DCAT, DCT, CleanedURIRef, URIRefOrLiteral | ||
from ckanext.dcat.profiles.euro_dcat_ap_3 import EuropeanDCATAP3Profile | ||
|
||
# HealthDCAT-AP namespace. Note: not finalized yet | ||
HEALTHDCATAP = Namespace("http://healthdataportal.eu/ns/health#") | ||
|
||
# Data Privacy Vocabulary namespace | ||
DPV = Namespace("https://w3id.org/dpv#") | ||
|
||
namespaces = { | ||
"healthdcatap": HEALTHDCATAP, | ||
"dpv": DPV, | ||
} | ||
|
||
|
||
class EuropeanHealthDCATAPProfile(EuropeanDCATAP3Profile): | ||
""" | ||
A profile implementing HealthDCAT-AP, a health-related extension of the DCAT application profile | ||
for sharing information about Catalogues containing Datasets and Data Services descriptions in Europe. | ||
""" | ||
|
||
def parse_dataset(self, dataset_dict, dataset_ref): | ||
# Call super method for DCAT-AP 3 properties | ||
dataset_dict = super(EuropeanHealthDCATAPProfile, self).parse_dataset( | ||
dataset_dict, dataset_ref | ||
) | ||
|
||
dataset_dict = self._parse_health_fields(dataset_dict, dataset_ref) | ||
|
||
return dataset_dict | ||
|
||
def _parse_health_fields(self, dataset_dict, dataset_ref): | ||
self.__parse_healthdcat_stringvalues(dataset_dict, dataset_ref) | ||
|
||
self.__parse_healthdcat_intvalues(dataset_dict, dataset_ref) | ||
|
||
# Add the HDAB. There should only ever be one but you never know | ||
agents = self._agents_details(dataset_ref, HEALTHDCATAP.hdab) | ||
if agents: | ||
dataset_dict["hdab"] = agents | ||
|
||
# Retention period | ||
retention_start, retention_end = self._time_interval( | ||
dataset_ref, HEALTHDCATAP.retentionPeriod, dcat_ap_version=2 | ||
) | ||
retention_dict = {} | ||
if retention_start is not None: | ||
retention_dict["start"] = retention_start | ||
if retention_end is not None: | ||
retention_dict["end"] = retention_end | ||
if retention_dict: | ||
dataset_dict["retention_period"] = [retention_dict] | ||
|
||
return dataset_dict | ||
|
||
def __parse_healthdcat_intvalues(self, dataset_dict, dataset_ref): | ||
for key, predicate in ( | ||
("min_typical_age", HEALTHDCATAP.minTypicalAge), | ||
("max_typical_age", HEALTHDCATAP.maxTypicalAge), | ||
("number_of_records", HEALTHDCATAP.numberOfRecords), | ||
("number_of_unique_individuals", HEALTHDCATAP.numberOfUniqueIndividuals), | ||
): | ||
value = self._object_value_int(dataset_ref, predicate) | ||
# A zero value evaluates as False but is definitely not a None | ||
if value is not None: | ||
dataset_dict[key] = value | ||
|
||
def __parse_healthdcat_stringvalues(self, dataset_dict, dataset_ref): | ||
for ( | ||
key, | ||
predicate, | ||
) in ( | ||
("analytics", HEALTHDCATAP.analytics), | ||
("code_values", HEALTHDCATAP.hasCodeValues), | ||
("coding_system", HEALTHDCATAP.hasCodingSystem), | ||
("health_category", HEALTHDCATAP.healthCategory), | ||
("health_theme", HEALTHDCATAP.healthTheme), | ||
("legal_basis", DPV.hasLegalBasis), | ||
("personal_data", DPV.hasPersonalData), | ||
("population_coverage", HEALTHDCATAP.populationCoverage), | ||
("publisher_note", HEALTHDCATAP.publisherNote), | ||
("publisher_type", HEALTHDCATAP.publisherType), | ||
("purpose", DPV.hasPurpose), | ||
): | ||
values = self._object_value_list(dataset_ref, predicate) | ||
if values: | ||
dataset_dict[key] = values | ||
|
||
def graph_from_dataset(self, dataset_dict, dataset_ref): | ||
super().graph_from_dataset(dataset_dict, dataset_ref) | ||
for prefix, namespace in namespaces.items(): | ||
self.g.bind(prefix, namespace) | ||
|
||
## key, predicate, fallbacks, _type, _class | ||
items = [ | ||
("analytics", HEALTHDCATAP.analytics, None, URIRefOrLiteral), | ||
("code_values", HEALTHDCATAP.hasCodeValues, None, URIRefOrLiteral), | ||
("coding_system", HEALTHDCATAP.hasCodingSystem, None, URIRefOrLiteral), | ||
("health_category", HEALTHDCATAP.healthCategory, None, URIRefOrLiteral), | ||
("health_theme", HEALTHDCATAP.healthCategory, None, URIRefOrLiteral), | ||
("legal_basis", DPV.hasLegalBasis, None, URIRefOrLiteral), | ||
( | ||
"population_coverage", | ||
HEALTHDCATAP.populationCoverage, | ||
None, | ||
URIRefOrLiteral, | ||
), | ||
("personal_data", DPV.hasPersonalData, None, URIRef), | ||
("publisher_note", HEALTHDCATAP.publisherNote, None, URIRefOrLiteral), | ||
("publisher_type", HEALTHDCATAP.publisherType, None, URIRefOrLiteral), | ||
("purpose", DPV.hasPurpose, None, URIRefOrLiteral), | ||
] | ||
self._add_list_triples_from_dict(dataset_dict, dataset_ref, items) | ||
|
||
items = [ | ||
("min_typical_age", HEALTHDCATAP.minTypicalAge), | ||
("max_typical_age", HEALTHDCATAP.maxTypicalAge), | ||
("number_of_records", HEALTHDCATAP.numberOfRecords), | ||
("number_of_unique_individuals", HEALTHDCATAP.numberOfUniqueIndividuals), | ||
] | ||
for key, predicate in items: | ||
self._add_nonneg_integer_triple(dataset_dict, dataset_ref, key, predicate) | ||
|
||
self._add_agents(dataset_ref, dataset_dict, "hdab", HEALTHDCATAP.hdab) | ||
|
||
def _add_nonneg_integer_triple(self, dataset_dict, dataset_ref, key, predicate): | ||
""" | ||
Adds non-negative integers to the Dataset graph (xsd:nonNegativeInteger) | ||
|
||
dataset_ref: subject of Graph | ||
key: scheming key in CKAN | ||
predicate: predicate to use | ||
""" | ||
value = self._get_dict_value(dataset_dict, key) | ||
|
||
if value: | ||
try: | ||
if int(value) < 0: | ||
raise ValueError("Not a non-negative integer") | ||
self.g.add( | ||
( | ||
dataset_ref, | ||
predicate, | ||
Literal(int(value), datatype=XSD.nonNegativeInteger), | ||
) | ||
) | ||
except (ValueError, TypeError): | ||
self.g.add((dataset_ref, predicate, Literal(value))) | ||
|
||
def _add_timeframe_triple(self, dataset_dict, dataset_ref): | ||
temporal = dataset_dict.get("temporal_coverage") | ||
if ( | ||
isinstance(temporal, list) | ||
and len(temporal) | ||
and self._not_empty_dict(temporal[0]) | ||
): | ||
for item in temporal: | ||
temporal_ref = BNode() | ||
self.g.add((temporal_ref, RDF.type, DCT.PeriodOfTime)) | ||
if item.get("start"): | ||
self._add_date_triple(temporal_ref, DCAT.startDate, item["start"]) | ||
if item.get("end"): | ||
self._add_date_triple(temporal_ref, DCAT.endDate, item["end"]) | ||
self.g.add((dataset_ref, DCT.temporal, temporal_ref)) | ||
|
||
def _add_relationship( | ||
self, | ||
dataset_ref, | ||
dataset_dict, | ||
relation_key, | ||
rdf_predicate, | ||
): | ||
""" | ||
Adds one or more Relationships to the RDF graph. | ||
|
||
:param dataset_ref: The RDF reference of the dataset | ||
:param dataset_dict: The dataset dictionary containing agent information | ||
:param relation_key: field name in the CKAN dict (.e.g. "qualifiedRelation") | ||
:param rdf_predicate: The RDF predicate (DCAT.qualifiedRelation) | ||
""" | ||
relation = dataset_dict.get(relation_key) | ||
if ( | ||
isinstance(relation, list) | ||
and len(relation) | ||
and self._not_empty_dict(relation[0]) | ||
): | ||
relations = relation | ||
|
||
for relation in relations: | ||
|
||
agent_uri = relation.get("uri") | ||
if agent_uri: | ||
agent_ref = CleanedURIRef(agent_uri) | ||
else: | ||
agent_ref = BNode() | ||
|
||
self.g.add((agent_ref, DCT.type, DCAT.Relationship)) | ||
self.g.add((dataset_ref, rdf_predicate, agent_ref)) | ||
|
||
self._add_triple_from_dict( | ||
relation, | ||
agent_ref, | ||
DCT.relation, | ||
"relation", | ||
_type=URIRefOrLiteral, | ||
) | ||
self._add_triple_from_dict( | ||
relation, | ||
agent_ref, | ||
DCAT.hadRole, | ||
"role", | ||
_type=URIRefOrLiteral, | ||
) | ||
|
||
def graph_from_catalog(self, catalog_dict, catalog_ref): | ||
super().graph_from_catalog(catalog_dict, catalog_ref) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the same logic as the standard
temporal_coverage
handling. If there's no change is best to delete the method to avoid duplication.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same logic (copy-pasted the lines you're mentioning), moved to a separate function instead of inline. I could also split it off to a separate function in that file to avoid duplication?