-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing RE2::Set #43
Comments
Sounds useful. We can implement it as a custom Points to clarify:
AsideI have a project, which was trying to generate an optimal lexer automatically: https://github.com/uhop/parser-toolkit One optimization I was trying to do is converting simple matchers into one to increase the overall performance. For example, I have two matchers: /^(\d+)|(\w+)/
The technique worked and produced a long list of possible matches, all but one of them are empty. And the only way to find out the matched group was a linear search with its |
The API I was thinking of was in my opinion even simpler, something like : let re = new RE2("pattern", "flags"); // no change compared to the current version
re.exec("input"); // no change either, the same goes for .test, .match, .replace, .search, .split, .toString, .source, .flags, …
re.sources; // undefined
let reSet = new RE2([ "in", "put", "some other pattern", … ], "gi");
// assume reSet.lastIndex = 0; before each of the following lines
reSet.exec("input"); // [ "in", index: 0, groups: undefined, input: "input", patternIndex: 0, pattern: "in" ]
reSet.test("input"); // true
"input".match(reSet); // same as 2 lines above
"input".search(reSet); // 0
"input".split(reSet); // [ "", "", "" ]
"input".replace(reSet, () => "X"); // "XX"
"input".replace(reSet, [ () => "X", () => "Y" ]); // "XY"
reSet.flags; // "gi"
reSet.source; // "in|put|some other pattern|…"
reSet.sources; // [ "in", "put", "some other pattern", … ]
reSet.toString(); // "/in|put|some other pattern|…/gi" |
As for determining the matched pattern, if I understood the |
Because it is an internal deal, not published anywhere, I don't really care as long as it is reliable. |
Writing an adapter which enables one to write Anyway, I'm progressing, step after step, when I find time for it. |
Looking into this RE2 option, it seems less useful: apparently it is used only as extension of test(). It means it cannot return matches, groups, and so on. Just a Boolean value. I am closing this ticket until we have a definitive decision and its execution plan. |
Using Would suggest something like the following: const set = new RE2Set(patterns)
for (const idx of set.test('asdasdasd')) {
console.log(pattern[idx], 'match!')
} |
40x is a compelling number. I guess it makes this feature back in the game. |
The
RE2::Set
class allows to match a string against several regular expressions in a single pass, seemingly more efficiently than piping all the regexes together :https://github.com/google/re2/blob/f2cc1aeb5de463c45d020c446cbcb028385b49f3/re2/set.h#L21-L23
It could be exposed to JavaScript code by :
Array
of patterns as the first parameter of the constructor ;RE2::Set
to identify which one of the patterns matches (it doesn't seem to go further than that), and then use regularRE2
s corresponding to the identified patterns to get more information ;.exec()
, returning the index of the pattern which matched and/or the pattern itself as properties of the returned array (maybe withSymbol
keys, to eliminate the risks of name collisions with future properties that could be defined by the ECMAScript spec) ;source
property, for compatibility ;sources
property (or a property with aSymbol
key) containing the individual patterns ;internalSource
property, or applying the same process as for thesource
property.Use cases could be optimizing anything that boils down to this kind of code :
For example, a HTTP router, a lexer …
What do you think about such a feature?
The text was updated successfully, but these errors were encountered: