You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote the regex r"\\u{[^}]*}" which works (as \\u{[^}]*}) on regex101 under pcre, js, python, and go flavors. When parsing with this crate, it gives:
"decimal literal empty" was not helpful in figuring out the problem, which was that regex_syntax expects {} to be a repetition, and the other mentioned regex engines silently fell back to matching a literal { and } when it wasn't a repetition. The correct unproblematic regex escapes the braces: r"\\u\{[^}]*\}".
This could be considered a bug or a suboptimal error depending on how you think this regex should be processed. I'd be perfectly happy if the error were to say something along the lines of "expected bounded repetition" here, rather than the current vague "decimal literal empty". (I would understand the error if it were {}, but with something other than } after the {, it's confusing.)
The text was updated successfully, but these errors were encountered:
Yeah, the error message should definitely be improved here. The behavior does indeed match my intent. Specifically, I biased toward less implicitness in the syntax. That is, if something is a meta character and you want to use it as a literal, then it needs to be escaped. There are some exceptions to this, particularly, in character classes, e.g., []] and [-a-z], due to their prevalence. The thinking here is that if reading a { requires a human to go and interpret whether it "needs" to be escaped or not in order to determine whether it's a meta character or not, then the regex becomes harder to read.
I wrote the regex
r"\\u{[^}]*}"
which works (as\\u{[^}]*}
) on regex101 under pcre, js, python, and go flavors. When parsing with this crate, it gives:"decimal literal empty" was not helpful in figuring out the problem, which was that regex_syntax expects
{}
to be a repetition, and the other mentioned regex engines silently fell back to matching a literal{
and}
when it wasn't a repetition. The correct unproblematic regex escapes the braces:r"\\u\{[^}]*\}"
.This could be considered a bug or a suboptimal error depending on how you think this regex should be processed. I'd be perfectly happy if the error were to say something along the lines of "expected bounded repetition" here, rather than the current vague "decimal literal empty". (I would understand the error if it were
{}
, but with something other than}
after the{
, it's confusing.)The text was updated successfully, but these errors were encountered: