-
-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Injecting custom regex implementations #1142
Comments
sorry if I'm getting lost in super-fine details, here's a minimal example: import jsonschema
jsonschema.validate(
dict(
foo="aa"
),
dict(
type="object",
properties=dict(
foo=dict(
type="string",
pattern=r"a?+a"
)
)
)
) works in python 3.11 (returns
this isn't a library issue, again, python's |
Hi there -- today no, but in the future yes (so happy to leave this open). The spec says that implementations SHOULD use a dialect of JavaScript regexes -- which we've never been able to do because no implementation of them was available to Python, but now there is one, so yes it's definitely planned to allow you to inject your own regular expression implementation (at which point sure you'd be able to inject this one too). Though to do so will involve implementing a protocol most likely, since the libraries all have subtly different APIs. |
oh, that sounds amazing! let's just leave this open for now then, I'll look for a different short-term workaround instead. thanks :) |
regex
instead of re
@Julian, I just saw this and wanted to let you know that I'd be super-interested in trying out the python-ized Right now there's a super gross hack I put in to let some common JS regexes pass the
On that last note! For the OP: If you construct your own format checker, you should be able to slot in a customized regex check by applying the |
(As usual helpful comments and as usual I'm responding just to a bit to start hah, but...)
More comments of course welcome! |
Just a quick little update on this: I've been using regress in the CLI for a while now for format validation and it's working great. No complaints from users and no issues across my own usages. I have one outstanding issue which would potentially be handled by being able to use regress for |
Yeah essentially swapping out the whole definition of |
I've got a need for changing the regex implementation to be able to support unicode categories such as jsonschema/jsonschema/_keywords.py Line 2 in ba47f7f
try:
import regex as re
except ModuleNotFoundError:
import re an option? It works for my uses case as tested with current versions of both libraries on Python 3.12. The conditional import may have to be changed for earlier Python versions... |
That's coincidentally related to some work going on with It would be bad and surprising if two different regex implementations were used by default internally. So we need to consider the format checker too. In aggregate, that makes me think that simply swapping things out would be a bad idea. Once the I think my favorite idea for how this eventually looks is that you could do... from jsonschema import regex_variants
...
validator = MyValidatorClass(regex_variant=regex_variants.REGRESS) (i.e. we push at least a couple of implementations down into jsonschema) Then, regress could become the default in a major release and -- here's a bit I like about that -- the path to upgrade but retain old behavior is open. You'd just have to start passing the variant explicitly. I think you have to pass the variant implementation to the format checker as well? So that's not a great interface but it works. Some of these ideas have been in my head for a few days but haven't been put to paper until just now. |
@sirosen , thank you, I now see it is not quite as easy as I thought after having a first look. My current issue is similar to python-jsonschema/check-jsonschema#353 (comment) and would be solved if I could replace the regex implementation used for |
I've just done a Regarding implementation, I have some notes which should hopefully be useful for
EDIT: To clarify, when I said that |
Hi there,
this is a kind of mix between question and feature request.
I'm interested in a regex feature called possessive quantifiers. In short, these allow you to use the quantifiers
*+
,++
and?+
. These act like their counterparts (*
,+
,?
) except that if a match was found, they will not backtrack. as an example, the patterna?+a
will not match the stringa
because the regex engine doesn't backtrack "out of" the firsta?
. kind of like a super-greedy matchpython supports these starting with 3.11, but some of us are stuck on lower versions (3.9 in my case).
Is there an easy way to make the
pattern
property (or the keys ofpatternProperties
) use theregex
library instead of the builtinre
? It's compatible with the builtin, but has some additional backported features like possessive quantifiers.Ideally, I'd like some kind of optional argument where I can enable the 3rd party module and make python-jsonschema use that instead of the builtin. I don't think that should be the default, as you probably don't want "useless" extra dependencies.
Alternatively, patching this myself at runtime is probably possible, but it's not going to be pretty. If that's your recommendation, any hints about where to patch it?
As a final option, there are ways to mimic the behavior of possessive quantifiers with existing regex features, but that's not pretty either.
The text was updated successfully, but these errors were encountered: