This repository has been archived by the owner on Dec 15, 2022. It is now read-only.
Handle regexes with unicode escape sequences in .find and .findAll #43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes atom/atom#16126
Background
The
TextBuffer.find
family of methods use PCRE to perform regex matching directly on the buffer's native contents. This has a number of advantages: it eliminates a lot of string copying, removes to need for conversions between raw character indices and row/column coordinates, and allows searching to be done on a background thread if desired. See #5, #35.Problem
There are some differences between PCRE's and ECMAScript's regex syntax. One difference is that PCRE does not support the
\u00df
syntax for specifying UTF16 character codes.Solution
Luckily, it's trivial to convert these character sequences into their actual UTF16 values. I've added that conversion in this PR.
Future Work
There may be other incompatibilities between PCRE and ECMAScript regexes. One other example I know of is the unicode code point escape sequence, which are slightly different than the UTF16 character code escape sequences I've dealt with here. They should be easy to support as well, though I have not done so here.
/cc @t9md