-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forbid truthy values for lang when initializing Literal #1494
Conversation
The change that you propose doesn't have quite the desired effect. An Exception is raised but it's raised in an unexpected context and the message is misleading, suggesting that a bytes-like object might be acceptable when it isn't: >>> Literal("foo", lang={})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../rdflib/rdflib/term.py", line 551, in __new__
if lang is not None and not _is_valid_langtag(lang):
File ".../rdflib/rdflib/term.py", line 92, in _is_valid_langtag
return bool(_lang_tag_regex.match(tag))
TypeError: expected string or bytes-like object (Ill-informed speculation removed) |
Yep true, though I think for the initial step, wouldn't it be good to have the same error message when Edit: I replied to you without refreshing the page, so I didn't see your edit. Please check my edit history, otherwise I can just create some PRs/Issues for what was discusssed. |
I saw your edits, sorry for any confusion. I appreciate your observations about keeping RDFLib light and in order to preseve the potential utility of the rfc5646 solution for those occasions where it might come in handy, I've distilled out a patch which I'll include as a recipe (along with a reference to osbigcat's rfc5646 repos) in an RDFLIb "Cookbook" which I've (just) decided to create as an addition to the RDFLib documentation. w.r.t. to the PR, ultimately, I boiled it down to just this: diff --git a/rdflib/term.py b/rdflib/term.py
index 796a76b3..36d14db6 100644
--- a/rdflib/term.py
+++ b/rdflib/term.py
@@ -548,8 +548,13 @@ class Literal(Identifier):
"per http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal"
)
- if lang and not _is_valid_langtag(lang):
- raise Exception("'%s' is not a valid language tag!" % lang)
+ if lang is not None:
+ if not isinstance(lang, str):
+ raise TypeError(
+ f"language tag '{lang}' is {type(lang)}, must be a string!"
+ )
+ elif not _is_valid_langtag(lang):
+ raise ValueError(f"'{lang}' is not a valid language tag!")
if datatype:
datatype = URIRef(datatype) and a minimalist test: import pytest
from rdflib import Literal
def test_literal_language_tag():
with pytest.raises(Exception) as e_info:
Literal("Hello", lang={})
assert (
repr(e_info.value)
== """TypeError("language tag '{}' is <class 'dict'>, must be a string!")"""
)
with pytest.raises(Exception) as e_info:
Literal("Hello", lang="999")
assert repr(e_info.value) == """ValueError("'999' is not a valid language tag!")""" Cheers, Graham |
@veyndan, are you happy to adopt @gjhiggins's suggested updates and the test? @gjhiggins where are you making up the CookBook? That seems like a great idea and perhaps we can put a bit of effort into a CookBook and release it with rdflib 7.0.0 perhaps March next year? I've wanted to come up with rdflib patterns or a CookBooks style thing for some time, so if there is momentum here... |
@veyndan @gjhiggins since IANA maintains an authoritative list of the language tags RFC 5646 Registry Format and Maintenance:
with the Registry actually at https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry Why can't we test not for string, not for non binary etc but for actual values in that list? Storing a copy of the list in rdflib with function to regenerate it per release and another function for users to locally append to the rdflib list (for local use, if our master list is out of date) wouldn't be too hard. In further support of this, the W3C says:
|
That's pretty much what I had, following ozbigcat's suggestion - the repos includes an updater for when the IANA list changes and simple instructions. Given that the IANA source does arbitrarily change, my concern was not to add yet another chore to the RDFLib maintenance burden for what is arguably a minor nuance of support. However, it's a reasonably clean change which a proficient user could easily make, so I thought it would make a good starter for a Cookbook. There is also the danger that if the library starts checking lang tags against the spec, some well-meaning soul will inevitably point out that Literal values aren't checked for validity w.r.t. a declared datatype and for consistency's sake, perhaps they should be :) |
Just locally atm, while I collect some "recipes" as a side effect of working through two-dozen-odd issues related to the identifier-as-context isssue, checking if they're still extant w.r.t. the rebased PR. As soon as I've got a few putative entries I can begin to hallucinate a structure and will create a repos for the content. (e,g, Edmond Chuc's response to #1469 is another excellent candidate for a Cookbook recipe). FWIW, I was the original perpetrator of the RDFLib docs, back in the day (mostly created from pulling in stuff the other devs had written) and although I haven't communicated as much, I was looking to assist you with refreshing/reworking the RDFLib docs (happily, I've managed to regain access to my readthedocs a/c). Again, this is also a side-effect of the id-as-ctx work, f'rinstance the confusion around the param bindings for |
OK, since the benefits are low, let's not add this burden! |
Last thing: policy is to see a test added as well that indicates how that the change is working as expected, so please add a short test (to an existing test file if one look appropriate or add a new file) and then, i all good, I'll approve. |
@veyndan not sure if you are working on tests here, if you are let me know, if not I'm going to make a PR to include some tests from https://github.com/veyndan/rdflib/pull/2 |
I'm not sure what tests to add here tbh, as I'm still not convinced that we should add tests for non-strings (per our discussion in #1498). So the truthy value for string is |
Quoting response on #1498 (comment)
|
I made a pull request against your branch to add tests: https://github.com/veyndan/rdflib/pull/4 This follows the reasoning explained here: #1498 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As indicated, if a PR fixes a runtime problem (as this test will) then it should have tests that reproduce the runtime problem that is being fixed.
The PR i made against your branch include tests which I think are sufficient but other tests which cover similar cases can also be accepted.
Added tests to ensure TypeError occurs if invalid types of langtags are supplied. Also added tests to ensure that invalid language tag values raise ValueError.
Added additional tests for Literal language tag.
@nicholascar This PR should be good to go now (thanks @aucampia for adding the tests!). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, will wait for one more approval from another maintainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed considerations for such a small PR!
With the proposed change, an exception will be raised when
lang
is a truthy value. I found this quite confusing when messing around with the library thatLiteral("Hello", lang={})
was somehow considered a valid language tag.