-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polar-integer datatypes not verifying polarity of lexical value #1757
Comments
The third test in test_integers.py now constructs Literal objects for a strict set-equivalence of expected failing path-value pairs. However, it's not clear whether these Literals should have been constructible with rdflib, so this patch might revert significantly. References: * RDFLib/rdflib#1757 Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist Thanks for opening this issue, however this is a duplicate of my issue #848 from 2018, that is itself a duplicate of #737 from 2017. There is not much we can do about this. In the linked data world there is nothing stopping you from doing saying:
or (like your example)
There is no mandate on the RDF backend to ensure the lexical string matches and fits into the given data type. I made a suggestion in #848 that RDFLib should at least run a test on some well known XSD datatypes to add an |
@nicholascar Do you have any opinions on this? |
It's true that RDF/OWL is pretty weak on datatypes in general... but Python isn't! We could decide to enforce strict typing in RDFlib so that it is only able to produce values that both Python and RDF/OWL think are correct. This would be the equivalent to us enforcing use of defined namespace members with our use of So, we could throw a value error or similar, I think. What to do about parsing bad values though... Perhaps it's enough to prevent bad data generation (unless a deliberate workaround is used) and allow, but warn about, bad data parsing. We could indicate that if you really must make the triple
you should parse it from text and expect the warning. We could poll the mailing list for an approach? |
My 2c worth ... there are about 10 extant issues directly related to this but we don't know to what extent externally-sourced graphs are non-conformant in the respect of values agreeing with datatype declarations, so people could find themselves drowned in warnings. I totally agree with the suggestion of flagging a Literal as non-conformant but believe we should devolve actual enforcement to the user. I had a dig into this a few months ago and produced these notes (based on the XSD reference): URIRef(_XSD_PFX + "normalizedString"): None, # The lexical space of xsd:normalizedString is unconstrained (any valid XML
# character may be used). Its value space is the set of strings after whitespace
# replacement—i.e., after any occurrence of #x9 (tab), #xA (linefeed), and
# #xD (carriage return) have been replaced by an occurrence of #x20 (space)
# without any whitespace collapsing.
URIRef(_XSD_PFX + "token"): None, # The lexical and value spaces of xsd:token are the sets of all strings after
# whitespace replacement; i.e., after any occurrence of #x9 (tab), #xA (linefeed),
# or #xD (carriage return).These are replaced by an occurrence of #x20 (space)
# and collapsing. Collapsing is when contiguous occurrences of spaces are replaced
# by a single space, and leading and trailing spaces are removed.
URIRef(_XSD_PFX + "language"): None,
URIRef(_XSD_PFX + "boolean"): _parseBoolean, # The value space of xsd:boolean is true and false. Its lexical space accepts true,
# false, and also 1 (for true) and 0 (for false).
URIRef(_XSD_PFX + "decimal"): Decimal, # decimal number of arbitrary precision
URIRef(_XSD_PFX + "integer"): long_type, # arbitrarily large integer
URIRef(_XSD_PFX + "nonPositiveInteger"): int, # Minimum Inclusive: 0
URIRef(_XSD_PFX + "long"): long_type, # Minimum Inclusive: -9223372036854775808 Maximum Inclusive: 9223372036854775807
URIRef(_XSD_PFX + "nonNegativeInteger"): int, # Minimum Inclusive: 0
URIRef(_XSD_PFX + "negativeInteger"): int, # Minimum Inclusive: 0
URIRef(_XSD_PFX + "int"): long_type, # Minimum Inclusive: -2147483648 Maximum Inclusive: 2147483647
URIRef(_XSD_PFX + "unsignedLong"): long_type, # Minimum Inclusive: 0 Maximum Inclusive: 18446744073709551615
URIRef(_XSD_PFX + "positiveInteger"): int, # Minimum Inclusive: 1
URIRef(_XSD_PFX + "short"): int, # Minimum Inclusive: -32768 Maximum Inclusive: 32767
URIRef(_XSD_PFX + "unsignedInt"): long_type, # Minimum Inclusive: 0 Maximum Inclusive: 4294967295
URIRef(_XSD_PFX + "byte"): int, # Minimum Inclusive: -128 Maximum Inclusive: 127
URIRef(_XSD_PFX + "unsignedShort"): int, # Minimum Inclusive: 0 Maximum Inclusive: 65535
URIRef(_XSD_PFX + "unsignedByte"): int, # Minimum Inclusive: 0 Maximum Inclusive: 255
URIRef(_XSD_PFX + "float"): float, # An IEEE double-precision 64-bit floating-point number, the format is a
# mantissa followed, optionally, by the character 'E' or 'e' followed by
# an integer exponent, the following values are valid: INF (infinity), -INF
# (negative infinity), and NaN (Not a Number); INF is considered to be
# greater than all other values, while -INF is less than all other values
# and the value NaN cannot be compared to any other values although it
# equals itself.
URIRef(_XSD_PFX + "double"): float, # An IEEE double-precision 64-bit floating-point number, the format is a
# mantissa followed, optionally, by the character 'E' or 'e' followed by
# an integer exponent, the following values are valid: INF (infinity), -INF
# (negative infinity), and NaN (Not a Number); INF is considered to be
# greater than all other values, while -INF is less than all other values
# and the value NaN cannot be compared to any other values although it
# equals itself. |
I recently encountered an issue using pySHACL to validate a
xsd:positiveInteger
and found it was accepting"0"^^xsd:positiveInteger
.From review of today's
term.py
and the influence of the members of_NUMERIC_LITERAL_TYPES
, and from a test-patch (linking in a few moments, I need this Issue number first), this is currently an acceptable rdflib statement:What should this behavior be instead? A runtime error (something like
raise ValueError
) would be correct to me, but I appreciate it'd potentially be a jarring change for many applications.The text was updated successfully, but these errors were encountered: