-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better support for empty patterns #677
Conversation
For the test failure:
should the documentation of hir and implementation of exec.rs be changed to accept Empty nodes in is_alternation_literal? |
@sliquister Good question. The docs suggest that |
To avoid this assertion in tests when empty alternations are allowed: internal error: entered unreachable code: expected literal or concat, got Hir { kind: Empty, info: HirInfo { bools: 1795 } }', src/exec.rs:1568:18 The code in exec.rs relies on the documented invariant for is_alternation_literal: /// ... This is only /// true when this HIR expression is either itself a `Literal` or a /// concatenation of only `Literal`s or an alternation of only `Literal`s.
Some were rejected explicitly with this "alternations cannot currently contain empty sub-expressions" error. Some were causing alternations to be compiled into automaton matching a different regex. The root of the issue is that the compilation of the Empty pattern returns Patch { entry: self.insts.len(), .. }, that is it claims its entry point is the next instruction. This only works if the next instruction is unconditionally executed, which is not true for alternations. So we return None instead, and let the callers decide how to handle an empty regex.
Thanks, the tests all pass now (and I've learned of the existence of |
Have you had a chance to consider this? (I'm trying to decide if I need to workaround the underlying issue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks so much! Sorry about the delay.
This PR is in As a bonus, this also fixed #659! |
Marvelous, thank you! (especially with this change being released immediately) |
I propose fixing the known issue in compile.rs where empty patterns + alternations can cause compilation errors or be compiled into a wrong automaton.
I understand that the plan is to replace compile.rs with a better version, but this doesn't seem to be about to happen, and I think the fix is not an undue amount of work. If possible, I would prefer fixing
regex
than work around these issues (it's easy to avoid them in hand written regexes but when generating regexes, it's more annoying).About the change itself: all I do is compile
Empty
intoOk(None)
instead ofOk(Patch{ entry: the-next-instruction, ..})
(as that is not correct in general) and follow the type system. I tried fixing this in other ways with less diff, but I think this version is simpler: the diff is bigger (mostly due to changes types and indentation, which are easy to review) but the performance is clearly ok and the type system forces me to consider where changes need to happen. The kind of diffself.c(expr)?
->self.c(expr)?.unwrap_or(self.next_inst());
is saying "at this call site, the old behavior worked".The same automation should be generated in the case that were previously correctly and successfully compiled, except a few corner cases that involve empty subexpressions, where we may generate fewer instructions (like
(?:)?
).