-
Notifications
You must be signed in to change notification settings - Fork 12
Michael Saboff's Review #55
Comments
Thanks for your review, Michael!
Exactly. Some additional background on this was presented at the TC39 May 2021 meeting (and the preceding April 2021 Incubator Call):
I see. https://github.com/tc39/proposal-regexp-set-notation#whats-the-match-order-for-character-classes-containing-strings addresses this for properties of strings specifically (where the strings don’t have an inherent order), but as you pointed out we could choose to preserve the order of equal-length strings in string literals (e.g. |
The Unicode property of strings case is clear that order os same length strings is not important. It is the case that you point out, |
The order of same-length strings should not matter. However, I expect that implementations will implement character classes with set data structures (extending from only code points to also allowing strings), which means that they won't preserve parsing order (just like they don't for code points). Therefore I would be reluctant to suggest that the matching order for a given string length is the parsing order. Specifying a stable sort in the operation that creates a matcher object might be harmless but would be misleading if the construction of the CharSet didn't preserve the parsing order. |
We discussed this during the 2023-03-29 TC39 meeting and agreed not to make any spec changes. @waldemarhorwat pointed out that today’s character classes (supporting only strings of size 1) don’t have an inherent order either (e.g. |
Right. I also suggested that implementers should be free to use sets (implementations of mathematical sets), and that for runtime optimizations they might use tries (retrieval trees). |
I tried to summarize the outcome in #58. |
Closing now that #58 is merged. Thanks, everyone! |
Over all, it looks good.
For the Syntax Rules production ClassReservedDouble, is the long list of reserved doubled syntax characters a somewhat paranoid reservation for possible extensions without needing to add a new flag?
A possible nit question. For
22.2.2.7 Runtime Semantics: CompileAtom
, Step 6 of the production Atom :: CharacterClass,It seems to me that the sorting of Strings by descending order of string length might undermine the intent of a developer.
Consider a
CharacterClass
that contains a long list of strings, the developer may have ordered equal length strings within that character class by the expected match likelihood. If the sort is not stable, the sorting by length may circumvent that intended order, possibly impacting match performance.The text was updated successfully, but these errors were encountered: