Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect parsing of regex? #280

Closed
blaenk opened this issue Sep 12, 2016 · 9 comments · Fixed by #281
Closed

incorrect parsing of regex? #280

blaenk opened this issue Sep 12, 2016 · 9 comments · Fixed by #281
Labels

Comments

@blaenk
Copy link
Contributor

blaenk commented Sep 12, 2016

I hadn't worked with rust in months so I made major updates of both rustc and my dependencies (including regex). I noticed that my program was now failing in a weird way and I finally tracked it down to this bug (regression?), either that or I'm doing something very wrong here:

let re = Regex::new(r"(^\.)|(~$)").unwrap();
assert!(!re.is_match("favicon.png"));

That assertion is failing even though it shouldn't AFAIK. It certainly didn't before.

The regex is supposed to match if the string begins with . or ends with ~. Even without the groupings/capture groups it fails. Somehow it is matching on "favicon.png".

I can't imagine something like this got through though so I'm thinking it may be a misunderstanding I have of how that regex should work. I tried this just now in the chrome console with javascript and I don't see this problem there, though I am aware that regex implementations tend to differ, I wasn't aware that they differ in something like this.

@blaenk
Copy link
Contributor Author

blaenk commented Sep 12, 2016

Regulex seems to confirm my understanding of this pattern, unless I'm misunderstanding the diagram too.

@BurntSushi
Copy link
Member

Gah. This is probably a bug in the literal optimizer being too aggressive. I'll check it out tomorrow.

@blaenk
Copy link
Contributor Author

blaenk commented Sep 12, 2016

No worries, I would appreciate it.

@BurntSushi BurntSushi added the bug label Sep 12, 2016
BurntSushi added a commit to BurntSushi/regex that referenced this issue Sep 12, 2016
When a pattern is partially anchored and literal prefixes are detected,
we would scan for those prefixes like normal. On a match, we'd verify
whether the text starting at that prefix matched the rest of the regex.
This process doesn't work that well for partially anchored regexes, since
the prefix match presupposes that the regex is allowed to match at that
position. But if, say, a regex like `^z|a` finds a `z` in the middle of
the string, then it has lost the fact that `z` needs to appear at the
beginning of the string, and can therefore falsely report a match.

We could spend some effort and make this case work, but the literal
optimizer is already too complex. We need to simplify it to make future
optimizations like this possible.

Fixes rust-lang#280.
@BurntSushi
Copy link
Member

@blaenk Interestingly, running re.find("favicon.png") returns None as it should. The reason why is because the forward DFA was falsely reporting a match, but the backward DFA correctly determined it not to match. The is_match routine only uses the forward DFA, which resulted in the incorrect match bubbling up.

In any case, the underlying cause was indeed an aggressive literal optimization. #281 should fix it!

@blaenk
Copy link
Contributor Author

blaenk commented Sep 12, 2016

Ahhh. Thank you very much for fixing this!

@blaenk
Copy link
Contributor Author

blaenk commented Sep 12, 2016

No rush but do you have any idea when a new release will be out on crates.io with this fix? Or is there some set release schedule?

@BurntSushi
Copy link
Member

It's uploading. :-)

@blaenk
Copy link
Contributor Author

blaenk commented Sep 12, 2016

Thank you very much once again!!

@BurntSushi
Copy link
Member

No problem. 0.1.77 is now on crates.io! Thanks for the bug report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants