-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider RegExp template tag instead #45
Comments
That was the expressed preference 5 years ago, and in that time, no delegate nor anybody in userland seems to have developed a template tag, as a proposal or a library, and userland seems quite convinced that the existing solution does already solve the user-facing problem. Do you have an example of a library that attempts to solve this? If there are none, I think that does serve as evidence that it would not, in fact, add any value. If there are some, it'd be great to learn about them. |
Well, I think we should encourage people who want to do work in TC39 to develop this. I think the only evidence we have right now is that it hasn't been popularized yet; we're allowed to go beyond what popular libraries do. |
Indeed, but the problem that needs solving - and has needed solving for much longer than the 5 years since this proposal was rejected - has been consistently and completely solved by the low-level feature contained in this proposal. No need has manifested for anything more. |
Yes, I agree that this problem is extremely important to solve. |
History of the tag vs escape quarrel: Present the proposal again?, Results of TC39 presentation, specifically comments such as this, this, this, this, this, this For what it's worth from me: We currently have a working solution. One in the hand vs two in the bush. |
Talking to users, we have found that almost everyone wants a |
It would be great if the proposal's README summarized these explanations. I was unaware of some of this context. I would be especially interested in hearing about developers who are confused by To be clear, I'm not opposed to |
@littledan this proposal is 5 years old. The agenda item is not a proposal, it’s a discussion topic, during which context will be provided. Once this proposal is being actually proposed, then it will contain sufficient context in its documents. |
I suggest you add an FAQ topic to explain the context here as well, to help people get prepared for the meeting and avoid repeated questions along the lines I made. |
I've made a PR here: #46 just to start a FAQ, contributions welcome. |
I know of at least XRegExp.build. |
Wow 9 years ago slevithan/xregexp@47b338d - also CC @LeaVerou Edit: also relevant: slevithan/xregexp#103 Edit2: Note that |
It would be really interesting (but likely impossible) to know how much of xregexp usage is |
I think that having both a
|
This well-intentioned but entirely wrong idea derailed this proposal 5 years ago and we ended up without a solution at all. Please let's not do this again. |
@phpnode the way the stage process works is that the committee considers alternatives. I think the best path forward is to make a good faith effort to consider alternatives. The more I talk to users and language committees of other languages the more I lean towards I am frustrated by how things went 5 years ago, given the amount of work I put into this proposal back then. I think a better approach back then would have been a longer and more patient dialog with TC39 to understand how to push this* forward. Worst case (for people who need *this - an API for escaping regular expressions, not necessarily |
Another example: https://runkit.com/tolmasky/templated-regular-expression-example const { queryString } = templateRegExp(
{
key: /[^=#]+/,
value: /[^&#]+/,
parameter: /${key}=${value}/,
queryString: /\?(${parameter})(?:&(${parameter}))*/
});
queryString.test("?a=b&c=d") |
I'm not sure how that's a demonstration of escaping? That appears to be just a way to create a RegExp from objects (I also have one from ±6 years ago I don't maintain here: https://github.com/benjamingr/js-restructure :)). Is this it? https://www.npmjs.com/package/templated-regular-expression |
Yes, that's the package. And it's not intended as a demonstration of escaping, but rather another response to the "Do you have an example of a library that attempts to solve [high-level RegExp templating]?" question above regarding this specific issue, which @littledan raised to "solve the user-facing problem more directly, avoiding the need to concatenate RegExps". |
Letting users compose RegExps would be awesome. The RegExp syntax are optimized for small searchers, but the regular formalism makes it perfect for lexers, and that use case is very poorly served by literals. Likewise, backtracking makes sense for small searchers, but you can easily run into ReDOS when writing non-trivial parsers. For previous art, there's alsoThere's also https://github.com/trusktr/regexr by @trusktr. I have a personnal preference for JS-based combinators, as seen in compose-regexp. The The Alternatively one could imagine a syntax extension in RegExps for splicing in full patterns (the point of the regular formalism is composition, after all, that is at the core of the definition). Something like const p1 = RegExp.quantifier("{4}", /\w/)
const p2 = RegExp.intersect(/\p{sc=Latin}/u, /\p{Uppercaser}/u)
const combined = /^(?:(?>p1)|(?>p2))$/
// compare with
const tCombined = RegExp.from`^$(?:{p1}|${p2})$`
// or with combinators
const cCombined = sequence(/^/, either(p1,p2), /$/) Such a syntax would prevent splicing in character class and Edit, a bit rusty :-) |
|
another template tag library: https://github.com/mikesamuel/regexp-make-js |
Syntax aside, when composing, one has to take what's spliced in into account, or risk getting nonsensical results. Here's an example: const ab = /a|b/
const c = /c/
import {re} from 're-template-tag';
console.log(re`${ab}${c}`) // /a|bc/
import {sequence} from "compose-regexp"
console.log(sequence(ab, c)) // /(?:a|b)c/ |
I will object to any proposal that creates a new vulnerability to injection attacks. AFAICT, the means the only acceptable proposal is one based on tagged template literals. @mikesamuel and I wrote and championed the tagged template literal proposal. We got them added to JS primarily to support writing context aware SAFE escapers of embedded languages. This is exactly the situation we're talking about here. RegExp is precisely such an embedded language, and safe escaping cannot be done in this language without context sensitivity. At #45 (comment) , @cben links to https://github.com/mikesamuel/regexp-make-js This is the one I want to call everyone's attention to. It was written by @mikesamuel with substantial input from me, explicitly for the purpose of doing SAFE context aware escaping. It also accepts RegExp instances as substitution values, expressing nested matchers, not just nested data. This enables powerful composition of RegExps. I would support a proposal based on this implementation, that preserves both its safety and its compositionality. I would also support a proposal that rejects in more odd contexts, rather than bothering to do context aware escaping for those odd contexts. |
@erights could you give an example of an injection attack enabled by a non-ttl solution ? E.g. how would you craft an attack on compose-regexp? Edit: looking at the issues over at https://github.com/mikesamuel/regexp-make-js, I see a problem with Edit4: @erights beside quantifier brackets and back references, do you have other "weird" scenarios in mind? |
This proposal advanced to stage 2 with |
I would prefer if, rather than pursuing a low-level
RegExp.escape
feature, we would work on a high-level RegExp templating feature. This version would solve the user-facing problem more directly, avoiding the need to concatenate RegExps in the result, and can help engines avoid re-parsing (by just parsing the shared part once). If any part of RegExp syntax ends up requiring context-dependent escaping, a template constructor could resolve that in a way that context-independentRegExp.escape
cannot.Such a feature could look like this, to find the value of the variable
x
followed by the value ofy
with arbitrary whitespace in between:Here, flags are provided after a
/
within the RegExp (which is of course optional).The text was updated successfully, but these errors were encountered: