Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New HealthDCAT-AP profile #326

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

Markus92
Copy link
Contributor

@Markus92 Markus92 commented Dec 4, 2024

This PR adds a new profile class for the proposed HealthDCAT application profile. Closes #315

All mandatory properties are supported. Most recommended and optional properties are supported, too, with a few caveats (mainly related to unclarities of the specification).

There's a few minor issues still to sort out. Mainly, in some example data it seemed that geographical coverage (aka dcat:spatial) was not properly harvested, but that could also be an upstream issue.

Mandatory information regarding funding: This project is co-funded by the European Union under Grant Agreement 101100633 (EUCAIM). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. Funding was also provided by the Dutch National Growth Fund.

amercader and others added 16 commits November 25, 2024 19:02
Rdflib 7.1 introduced changes in [date parsing][1] that made some base
profile tests fail. Basically the previous rdflib versions incomplete
dates like

    <time:inXSDDateTimeStamp
         rdf:datatype="http://www.w3.org/2001/XMLSchema#date">
            1904
    </time:inXSDDateTimeStamp>

were expanded to `1904-01-01`. Of course this is not a valid date and
should be expressed using `gYear`:

    <time:inXSDDateTimeStamp
         rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear">
            1904
    </time:inXSDDateTimeStamp>

and we should be expecting `1904`.
This should play nice with the time properties we are generating in CKAN as
they already handle automatically `gYear`, `gYearMonth`, `date` and
`dateTime`.
Sites importing external DCAT representations that use the wrong
encoding might need to check their parsers.

[1] RDFLib/rdflib#2929
@Markus92 Markus92 marked this pull request as ready for review December 6, 2024 04:27
@Markus92
Copy link
Contributor Author

Markus92 commented Dec 6, 2024

@amercader as promised. There is still very heavy development going on of the HealthDCAT-AP specs but I feel like this is a really good starting point.

My last week at Health-RI is the week of 16-20 december, it'd be great if the major comments could come before that. Any comments after that my colleagues will pick up (@hcvdwerf and @kburger). Thanks again for your time!

Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Markus92 this week I've been busy and didn't have time to properly look at this but at first glance this looks great. I'll try to provide comments early next week. It would help to merge #324 and merge the latest master branch to remove unrelated changes.

reference = Graph()
reference.parse(data=contents, format="turtle")

print(s.g.serialize(format="turtle"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(s.g.serialize(format="turtle"))

@@ -0,0 +1,254 @@
"""Test document"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not any more I guess

except (ValueError, TypeError):
self.g.add((dataset_ref, predicate, Literal(value)))

def _add_timeframe_triple(self, dataset_dict, dataset_ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the same logic as the standard temporal_coverage handling. If there's no change is best to delete the method to avoid duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same logic (copy-pasted the lines you're mentioning), moved to a separate function instead of inline. I could also split it off to a separate function in that file to avoid duplication?

ckanext/dcat/schemas/health_dcat_ap.yaml Outdated Show resolved Hide resolved
if agents:
dataset_dict["hdab"] = agents

# Add any qualifiedRelations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the qualified relationships handling (parsing and serializing), and AFAICT there is nothing health specific to it and could be moved to one of the base DCAT AP profiles right? (euro_dcat_ap_scheming.py would be the right place I think as this was added in DCAT 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing health specific indeed, it's DCAT 2+. I will move it to the generic scheming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to the generic class, but did not have time anymore to add it to the test cases, sorry! It's still covered by HealthDCAT-AP test cases.

@Markus92
Copy link
Contributor Author

Thanks for your insightful comments, Adrià! I will let you know when they're all addressed and ready for re-review.

@Markus92 Markus92 requested a review from amercader December 27, 2024 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HealthDCAT-AP Profile
2 participants