-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panics parsing regex with whitespace in extended mode #349
Fix panics parsing regex with whitespace in extended mode #349
Conversation
The added tests fail without the fix like this: ---- parser::tests::ignore_space_escape_hex2 stdout ---- thread 'parser::tests::ignore_space_escape_hex2' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 10, surround: "x 5 3", kind: InvalidBase16(" 5 3") }', src/libcore/result.rs:860 ---- parser::tests::ignore_space_escape_hex stdout ---- thread 'parser::tests::ignore_space_escape_hex' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 12, surround: "{ 5 3 }", kind: InvalidBase16(" 5 3") }', src/libcore/result.rs:860 ---- parser::tests::ignore_space_ascii_classes stdout ---- thread 'parser::tests::ignore_space_ascii_classes' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 5, surround: "(?x)[ [ : ", kind: UnsupportedClassChar('[') }', src/libcore/result.rs:860 note: Run with `RUST_BACKTRACE=1` for a backtrace. ---- parser::tests::ignore_space_escape_octal stdout ---- thread 'parser::tests::ignore_space_escape_octal' panicked at 'valid octal number', src/libcore/option.rs:785 ---- parser::tests::ignore_space_escape_unicode_name stdout ---- thread 'parser::tests::ignore_space_escape_unicode_name' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 15, surround: "Y i }", kind: UnrecognizedUnicodeClass(" Y i") }', src/libcore/result.rs:860 ---- parser::tests::ignore_space_repeat_counted stdout ---- thread 'parser::tests::ignore_space_repeat_counted' panicked at 'called `Result::unwrap()` on an `Err` value: Error { pos: 15, surround: ", 1 0 }", kind: InvalidBase10("1 0") }', src/libcore/result.rs:860 The reason for the panics is that `bump_get` would ignore space when walking the characters, but then keep the spaces in the returned String. Found using cargo-fuzz.
The fuzz script is here (not sure if you would want to merge that or not): master...robinst:add-cargo-fuzz-script You can run it using The artifact that it returned was this: |
|
||
#[test] | ||
fn ignore_space_escape_octal() { | ||
assert_eq!(p(r"(?x)\ 1 2 3"), lit('S')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit weird that it's allowed to add space between digits of a number, but that seems to be the closest to the current behavior.
@robinst Thanks for finding this! Sorry it slipped out of my queue, but your blog post caught my attention. :-) Nice work! I'm not sure the fix is right either. Does this also apply to thinks like |
Yes, and things like Maybe whitespace should only be allowed between logical groups of characters. For example, it should not be allowed within a number or within a text identifier. Here's what other engines do: Oniguruma: Perl behaves the same way, checked with So at least for |
Thinking about this a bit more, it feels like we shouldn't allow arbitrary whitespace in arbitrary syntax. Maybe things like |
Instead of ignoring space in all the bump/peek methods (as proposed in pull request rust-lang#349), have an explicit `ignore_space` method that can be used in places where space/comments should be allowed. This makes parsing a bit stricter than before as well.
Agreed. I've prepared a different pull request here: #354 |
…e-strict, r=BurntSushi Fix panics with whitespace in extended mode by being more strict Instead of ignoring space in all the bump/peek methods (as proposed in pull request #349), have an explicit `ignore_space` method that can be used in places where space/comments should be allowed. This makes parsing a bit stricter than before as well.
I decided to go with #354 over this one. Thanks so much! |
The added tests fail without the fix like this:
The reason for the panics is that
bump_get
would ignore space whenwalking the characters, but then keep the spaces in the returned String.
Found using cargo-fuzz.