Skip to content

Commit

Permalink
syntax: fix trailing - bug
Browse files Browse the repository at this point in the history
This fixes a bug in the parser where a regex like `(?x)[ / - ]` would
fail to parse. In particular, since whitespace insensitive mode is
enabled, this regex should be equivalent to `[/-]`, where the `-` is
treated as a literal `-` instead of a range since it is the last
character in the class. However, the parser did not account for
whitespace insensitive mode, so it didn't see the `-` in `(?x)[ / - ]`
as trailing, and therefore reported an unclosed character class (since
the `]` was treated as part of the range).

We fix that in this commit by accounting for whitespace insensitive
mode, which we do by adding a `peek` method that skips over whitespace.

Fixes #455
  • Loading branch information
BurntSushi committed Mar 12, 2018
1 parent 3e370e4 commit 102458f
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 2 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
TBD
===
Bug gixes:

* [BUG #455](https://github.com/rust-lang/regex/pull/455):
Fix a bug where `(?x)[ / - ]` failed to parse.


0.2.8 (2018-03-12)
==================
Bug gixes:
Expand Down
56 changes: 54 additions & 2 deletions regex-syntax/src/ast/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,32 @@ impl<'s, P: Borrow<Parser>> ParserI<'s, P> {
self.pattern()[self.offset() + self.char().len_utf8()..].chars().next()
}

/// Like peek, but will ignore spaces when the parser is in whitespace
/// insensitive mode.
fn peek_space(&self) -> Option<char> {
if !self.ignore_whitespace() {
return self.peek();
}
if self.is_eof() {
return None;
}
let mut start = self.offset() + self.char().len_utf8();
let mut in_comment = false;
for (i, c) in self.pattern()[start..].char_indices() {
if c.is_whitespace() {
continue;
} else if !in_comment && c == '#' {
in_comment = true;
} else if in_comment && c == '\n' {
in_comment = false;
} else {
start += i;
break;
}
}
self.pattern()[start..].chars().next()
}

/// Returns true if the next call to `bump` would return false.
fn is_eof(&self) -> bool {
self.offset() == self.pattern().len()
Expand Down Expand Up @@ -1773,8 +1799,8 @@ impl<'s, P: Borrow<Parser>> ParserI<'s, P> {
// after a `-` is a `-`, then `--` corresponds to a "difference"
// operation.
if self.char() != '-'
|| self.peek() == Some(']')
|| self.peek() == Some('-')
|| self.peek_space() == Some(']')
|| self.peek_space() == Some('-')
{
return prim1.into_class_set_item(self);
}
Expand Down Expand Up @@ -5297,4 +5323,30 @@ bar
"#;
assert!(parser_nest_limit(pattern, 50).parse().is_ok());
}

// This tests that we treat a trailing `-` in a character class as a
// literal `-` even when whitespace mode is enabled and there is whitespace
// after the trailing `-`.
#[test]
fn regression_455_trailing_dash_ignore_whitespace() {
assert!(parser("(?x)[ / - ]").parse().is_ok());
assert!(parser("(?x)[ a - ]").parse().is_ok());
assert!(parser("(?x)[
a
- ]
").parse().is_ok());
assert!(parser("(?x)[
a # wat
- ]
").parse().is_ok());

assert!(parser("(?x)[ / -").parse().is_err());
assert!(parser("(?x)[ / - ").parse().is_err());
assert!(parser("(?x)[
/ -
").parse().is_err());
assert!(parser("(?x)[
/ - # wat
").parse().is_err());
}
}

0 comments on commit 102458f

Please sign in to comment.