-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo XSD Datetime, Date, Time, Duration parser and serializers #2929
Conversation
…n, XSD_Date, XSD_DatetTime, XSD_Time, XSD_gYear, XSD_gYearMonth. Based on isoformat for Python <3.11, and builtin fromisoformat for Python 3.11+
66efbf0
to
a9f3c9e
Compare
36515ec
to
d7953ce
Compare
9aa77bd
to
e9d5998
Compare
118642e
to
d605532
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
And thanks for providing the background on isodate and the nuances between ISO 8601 and XSD dates, made it very easy to review.
Finally after battling with the type checker, the linter, the ruff formatter, the doctring tests, the autodocs generator errors, then the linter again, then the type checker again, then the autodocs generator again, and finally the ruff formatter again, this is now passing all tests. |
@@ -167,7 +167,7 @@ def test_function(expression: str, expected_result: Identifier) -> None: | |||
if isinstance(expected_result, type): | |||
assert isinstance(actual_result, expected_result) | |||
else: | |||
assert expected_result == actual_result | |||
assert actual_result == expected_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was made to help the test suite output.
PyTest always assumes when asserting equality that LHS is actual, and RHS is expected, but this test harness compared them the other way around, so PyTest output was backwards that made debugging more difficult.
("P1Y"^^xsd:yearMonthDuration"2019-05-28T12:14:45Z"^^xsd:dateTime) | ||
("P1Y"^^xsd:yearMonthDuration"2019-05-28"^^xsd:date) | ||
("P1Y"^^xsd:yearMonthDuration "2019-05-28T12:14:45Z"^^xsd:dateTime) | ||
("P1Y"^^xsd:yearMonthDuration "2019-05-28"^^xsd:date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an error I found in the datetime test function, there was a missing space between the literal values, causing incorrect tests to be run on durations.
n = "Z" | ||
elif n.startswith("UTC"): | ||
# Replace tzname like "UTC-05:00" with simply "-05:00" to match Jena tz fn | ||
n = n[3:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a required change because previous implementation used tzinfo instances (timezone instances) from isodate
, but the new implementation always uses timezones from the stdlib. The SPARQL builtin TZ()
will return the "timezone name" if known, in python this uses the tzname()
function.
The difference is that stdlib timezones and isodate timezones have different tzname()
generation pattern. In isodate
an unnamed timezone with "-5H" offset will be have tzname of "-05:00", but in stdlib an unnamed timezone with "-5H" offset will have tzname of "UTC-05:00" (more correct IMHO).
So for consistency and backwards compatibility, this change was added to normalize the tzname output, I tested Jena's TZ()
SPARQL function too, and it also outputs "-05:00" like isodate
did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure to add yourself, @ashleysommer, to the contributor notes at the top of xsd_datetime.py
Rdflib 7.1 introduced changes in [date parsing][1] that made some base profile tests fail. Basically the previous rdflib versions incomplete dates like <time:inXSDDateTimeStamp rdf:datatype="http://www.w3.org/2001/XMLSchema#date"> 1904 </time:inXSDDateTimeStamp> were expanded to `1904-01-01`. Of course this is not a valid date and should be expressed using `gYear`: <time:inXSDDateTimeStamp rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear"> 1904 </time:inXSDDateTimeStamp> and we should be expecting `1904`. This should play nice with the time properties we are generating in CKAN as they already handle automatically `gYear`, `gYearMonth`, `date` and `dateTime`. Sites importing external DCAT representations that use the wrong encoding might need to check their parsers. [1] RDFLib/rdflib#2929
Rdflib 7.1 introduced changes in [date parsing][1] that made some base profile tests fail. Basically the previous rdflib versions incomplete dates like <time:inXSDDateTimeStamp rdf:datatype="http://www.w3.org/2001/XMLSchema#date"> 1904 </time:inXSDDateTimeStamp> were expanded to `1904-01-01`. Of course this is not a valid date and should be expressed using `gYear`: <time:inXSDDateTimeStamp rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear"> 1904 </time:inXSDDateTimeStamp> and we should be expecting `1904`. This should play nice with the time properties we are generating in CKAN as they already handle automatically `gYear`, `gYearMonth`, `date` and `dateTime`. Sites importing external DCAT representations that use the wrong encoding might need to check their parsers. [1] RDFLib/rdflib#2929
Rdflib 7.1 introduced changes in [date parsing][1] that made some base profile tests fail. Basically the previous rdflib versions incomplete dates like <time:inXSDDateTimeStamp rdf:datatype="http://www.w3.org/2001/XMLSchema#date"> 1904 </time:inXSDDateTimeStamp> were expanded to `1904-01-01`. Of course this is not a valid date and should be expressed using `gYear`: <time:inXSDDateTimeStamp rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear"> 1904 </time:inXSDDateTimeStamp> and we should be expecting `1904`. This should play nice with the time properties we are generating in CKAN as they already handle automatically `gYear`, `gYearMonth`, `date` and `dateTime`. Sites importing external DCAT representations that use the wrong encoding might need to check their parsers. [1] RDFLib/rdflib#2929
This PR introduces a XSD-specific datetime parser/serializer module that acknowledges there are subtle but important differences between iso8601 and XSD date strings.
RDFLib had been relying on some undocumented (and faulty) parsing behaviour in the old version of
isodate
that coincidentally allowed parsing of strings that are XSD-compliant but not iso8601 compliant.The new version of isodate fixes this issue, that means many tests were now failing in RDFLib due to XSD strings no longer parsing (eg, dates with a timezone "2024-01-31+10:00" that are XSD compliant but not iso compliant, the old version of isodate dropped the tz off the end by accident, so it worked fine, the new version does not allow it).
The preferred solution would be to switch to the new
datetime.fromisoformat()
function in Python stdlib in Python 3.11+, but there are three issues to solve with that plan:isodate
version, it also doesn't support non-standard iso formats.isodate
'sDuration
class and correspondingduration_isoformat
andparse_duration
functions.timedelta
doesn't support deltas with years and months components, needed in XSD durations.timedelta
doesn't haveto_isoformat()
orfromisoformat()
formats, needed to parse XSD duration strings (based on iso8601 duration strings).So this PR addresses all of the above.
xsd_datetime.py
module is added to RDFLib to house all of this XSD-datetime/duration-specific parsing/serializing logic.Duration
class andduration_isoformat()
andparse_duration()
functions are absorbed fromisodate
into RDFLib.isoformat
library is removedfromisoformat()
functionxsd_datetime.py
module before calling into the stdlib function.isodate
isodate
utilityxsd_datetime.py
module before calling intoisodate
.six
in our dependency tree!xsd_datetime.py
based onstrftime
that can output XSD-specific XSD_Duration and XSD_Datetime strings.