1 September, 2021 Meeting Notes

In-person attendees: None

Remote attendees:

Name	Abbreviation	Organization
Waldemar Horwat	WH	Google
Mark Cohen	MPC	Salesforce
Jack Works	JWK	Sujitech
Josh Blaney	JPB	Apple
Richard Gibson	RGN	OpenJS Foundation
Robin Ricard	RRD	Bloomberg
Rob Palmer	RPR	Bloomberg
Leo Balter	LEO	Salesforce
Sergey Rubanov	SRV	Invited Expert
Chris de Almeida	CDA	IBM
Chip Morningstar	CM	Agoric
Philip Chimento	PFC	Igalia S.L.
J. S. Choi	JSC	Indiana University

BigInt Math for Stage 1

Presenter: J. S. Choi (JSC)

proposal
slides

JSC: Hi everyone. My name is—for those of you who weren’t here yesterday—is Joshua Choi, JSC: you can call me whatever. I’m with Indiana University. I’m a physician of internal medicine, but I’m also a clinical informaticist. And I work with JavaScript a lot in data analytics and app design.

JSC: So, with that said, here is another proposal from me. It’s for BigInt Math. Today I’m seeking Stage 1—so, you know, this is just figuring out if the Committee believes that this is a problem worth exploring and thinking about solutions to.

JSC: [slide 2] So this is basically the problem, right? There’s a bunch of Math functions. They would make sense with BigInts, but they don’t work with BigInts. When I asked around about this, someone said—I think it was JHD they were trying to figure this stuff then they never got around to it, but it’s something worth proposing: to extend the Math functions. So this is it. This is what we’re going to do. It’s pretty straightforward in general with a couple of corner issues, but that shouldn’t block Stage 1.

JSC: [slide 3] Just to briefly review: BigInts are important. That’s why we added them to the language. Not everyone uses them. But for those who do, they’re important. Math stuff, financial stuff, science stuff—there are some web APIs like the Web Performance API for high-resolution times. So it would be good to be able to figure out the maximum of an array of BigInts without having to write your own or import something from NPM or whatever, right? Every BigInt has a sign. Every BigInt has an absolute value. It’s basic stuff. It would be good to have it even though it’s easy to polyfill.

JSC: [slide 4] One thing is figuring out exactly which functions to extend, which ones make most sense. I’ll get into a little more on the criteria of what I used here. A lot of this is bikeshedding, to an extent. It shouldn’t block Stage 1. Stage 1 is mostly about, you know, whether extending Math functions to accept BigInts is a worthwhile endeavor to explore. Specific problems we can defer to Stage 2, although I’d be happy to talk about them with you all now. There’s stuff that makes sense and stuff that doesn’t make sense.

JSC: [slide 5] The big design choice that I have with this specific proposal. I’ve written a spec already and everything is that we should continue to avoid any unexpected or implicit type conversions. That’s what the original BigInt proposal and that’s what we are going to try to continue to do at least. I think we should. That’s why we’re excluding, for instance, sin. The only integers that sin could output are −1, 0, and 1. So kind of not useful. Other rational-returning functions should be okay. It may be useful to—although, I don’t know for this for sure—it may be useful for instance to get the [natural] exponent of a really big number like finance stuff like interests or something. The spec currently punts that to implementation approximations.

JSC: There’s some gray zones. Like, I don’t know about imul, I don’t know about clz32. These are pretty specialized; should we extend them? Maybe. I don’t know. Whatever: it shouldn’t block stage 1; we can figure that out later. I see someone commenting about ceil, floor, trunc. Yeah. I originally excluded those. I think it was someone—maybe it was JHD or DE—someone suggested to keep them in just for parity. I don’t have a strong opinion about that. It shouldn’t block Stage 1.

JSC: [slide 6] The bigger problem is with variadic functions. There are three variadic functions, min, max, and hypot, and min and max especially are extremely common, but they currently have a definition when you give them no arguments. Imagine if you’re giving max an array of BigInts and it possibly could be empty. And when it’s empty, it unexpectedly gives you a Number value. Not a BigInt value. To me, that is effectively an unexpected and implicit type conversion from an array of BigInts to a Number. So hopefully we can all agree that that’s a problem and something to be avoided since our invariant is no implicit type conversion.

JSC: [slide 7] So, right now there is a solution to spec has is having three separate methods for each of the three number variadic methods. This might not be popular with you. It is certainly ugly to me. This is less bad than having min implicitly return Numbers sometimes when you might give it an array of BigInts. But perhaps we could put them on the BigInt constructor instead. I don’t know: That raises other questions. Like, do we put everything else on BigInt’s constructor? I don’t know. This kind of bikeshedding shouldn’t block Stage 1. Stage 1 is for exploring stuff: whether Math stuff for BigInts is worth it and should be explored, whether we put everything on the BigInt constructor, whether we continue putting stuff in Math and have it determine whatever. It’s those questions that hopefully you can hash out with me in the issues on the repository. But the variadic thing is my biggest question but right now it shouldn’t block Stage 1

JSC: [slide 8] The specification right now does overload the Math operations. It uses the same machinery that was written up by the hard work of everyone who worked on BigInt. There are abstract numeric-type operations already. They’re already used for things like the exponentiation operator. So we reuse that machinery; we extend it for a bunch of other stuff. And so and then we just change the original Math function properties to use those abstract numeric operations.

JSC: [slide 9] A couple cross-cutting concerns. For instance, if we overload imul, would that affect asm.js? Would overwriting stuff in general have problems with engine optimizability? Although we already have an abstract numeric operation system that’s already working with exponentiation and addition, so it should fit into that—but maybe engine implementers have some input.

JSC: Whether there’s operations that would be really inefficient on precise integers: Today, there are always fallbacks, but, you know, as integers keep growing, maybe something gets exponentially worse, or something. There’s also Decimals. We should keep in mind that whatever we do here, which is probably going to be applied to Decimals too.

JSC: There’s a question of, if we have engine-approximated irrational numbers, like square roots or something, should we formally constrain the approximations to be mathematically monotonic or should we leave that away and not have a formal guarantee.

JSC: And, also, I have my own use cases for wanting some of these, but there may be other mathematicians and scientists who have their own problems or have special requirements for these operations.

JSC: That’s it. it. Stage 1. I would like to switch to the queue.

WH: I’m a mathematician too. I see two proposals here. One is for things which are well defined for integers such as min, max, and absolute value. Those I’m happy with. And then there’s the transcendental functions, and I can’t think of any possible use for including things like hyperbolic cosine for BigInts. That is incredibly expensive to implement because if you want to implement those correctly, you will need constants like e and pi to millions of digits included in the implementation just to support this feature.

JSC: I would be happy to drop them. The only reason why I floated them was the potential that, perhaps, there are potential engineering or math use cases that I don’t see. I certainly don’t see any of these cases right now, and I would be happy to drop them, if that is at all controversial.

WH: Yes. I see no reason to define hyperbolic cosine on BigInts. The smallest BigInt input on which it matters is 2^53+1 and you’re guaranteed to crash the implementation if you actually try to use it, because an implementation will either run out of time or run out of memory before you compute it.

JSC: I put in, I put in a couple of very large numbers into some of these functions on Wolfram Alpha and it took quite a long time, yes. It didn’t crash though; I was very surprised. Anyways, I would be happy to drop all the transcendentals. I can see potentially something having to do with exponents—maybe financiers might want e-to-the-whatever of a large number—but we can add that in later if people really need it. Don’t have any specific use cases.

WH: Yeah, we have an exponentiation operator defined, and that’s fine.

JSC: I’m talking about involving natural exponentiation.

WH: I would not want exp. That requires you to have e to arbitrary precision in the implementation.

JSC: All right, I will plan to drop that.

JHD: I think the general concept is safe. Strong support for. I think it’s just bizarre that intuitive stuff doesn’t work as far as max/min. You can greater than > or less than < a BigInt. And Number mixing is only a problem when there’s precision loss and that doesn’t apply to comparisons. So it just seems absurd to me that Math.max doesn’t just accept BigInts. And I haven’t gone through and audited Math methods, but I suspect that there’s a few where it just should work simply because there’s no good reason why it shouldn’t. And then there’s a bunch where it shouldn’t work because there are good reasons why it shouldn’t. And I think Stage 1 is absolutely the time to explore that.

JSC: I would like to second JHD’s thing: I would like all the help that I can get, when it comes to auditing each function, from engine implementers. And from anyone: anyone who knows any mathematicians, engineers, scientists, with regards to what their needs are and what the cost would be. I would err on the side of dropping early. All the transcendentals I will drop in the next week. As for max and min we yeah, so like the problem being when you have zero arguments—someone or me can open an issue on that and we can bikeshed it there. But yeah, that’s hopefully for Stage 1.

JHD: That was it.

JHX: I support this proposal. I only have two small questions about max and min. The first question is, if max and min should not return Infinity for BigInts, what should it return? (Second question is does Decimal also need decMax/decMin?)

JSC: So right now in the specification, bigMin with no arguments throws a TypeError. We can bikeshed that, or we can do whatever. I don’t really have strong opinions about that. But right now what it does with no arguments is to throw a TypeError. As for Decimals, yes, whatever approach we choose now—whether to put them on the BigInt constructor or whether to add bigMin and bigMax—whatever we decide would have to be extended to Decimals too. So if we went with bigMax, we would have a decMax, or if we put them on the BigInt constructor, we would have a Decimal.max or whatever.

SYG: V8 has reservations along the same lines that exactly what WH was saying: the implementation complexity and cost of many, many of them of the current Math functions just doesn’t make sense for BigInts. And it sounds like you are pretty on board with cutting down the list. My hunch here that it’ll end up being a pretty small list.

JSC: Please let me know everything that you think would be not performant or that you could possibly be not performant, at all. Please give your input in the issues. I would be happy to drop things early.

SYG: It’s more about complexity and cost, lacking direct use cases. Like what about not really interested in fleshing this out for completeness? So lacking a strong case outside. Of the ones that obviously do make sense like min/max. Well, the type stuff notwithstanding. Absolute value, sign, etc., outside of those, which obviously make sense. I would like to—like if we want to include stuff that has some complexity in the implementation, I would like to see a bunch of stronger use cases in motivation, it sure would be nice.

JSC: My goal is to err on the side of implementation complexity because I don’t know of any use cases right now. The only reason I included those in the first place was that maybe a mathematician or financier somewhere could use them. But if they need something they can propose, they can bring it up themselves. Consider anything that has a whiff of too much complexity in the engine dropped early. Please give your input in the issues. I’d be happy to work on that.

SYG: Sounds good. Thank you.

YSV: We also reviewed this proposal, and while we’re not going to block Stage 1, we have some concerns regarding the use cases for a number of them, but these were already mentioned by other individuals. And we also notice that there is some implementation-approximated math on what would otherwise be precise, and we struggle to see what the value of this would be. But yeah, we found a couple of concerns here. Like, how would this actually work? And I just wanted to echo what Shu was saying.

JSC: So, with regards to—are you talking specifically about possible functions that would possibly return irrational numbers for some integer inputs, but integer values for some integers? LIke square root for instance.

YSV: Yeah, for example.

JSC: Okay, so that’s what I was getting into when I was talking about, like, formal guarantees: like guaranteeing monotonicity, for instance, or guaranteeing that, for some values, if there’s an integer mathematical value for them, then return them. That’s only for the case for functions that could return irrationals like square root, if we did. So for instance, if we input 101n, presumably that would be implementation approximated. But should we guarantee that it couldn’t be the same as 100n? And should that be guaranteed to be 10n? Things like that, those are issues that labelled cross-cutting concerns.

JSC: I mentioned that’s if square root ends up in the list. I think there probably are use cases. I don’t have them myself. If there’s a lot of input implementation complexity and like even square root, I’d be happy to drop them. But otherwise we could hash this out in the repository. I consider implementer complexity to be a very high priority in the absence of clear use cases.

YSV: In the absence of clear use cases I have some concerns. And I would like to see—like we can always add methods later. But introducing spec text that behaves one way for something that doesn’t have a clear use case…people may start to rely on it and we won’t be able to roll it back.

JSC: We can search for the cases we can find. I would be happy to drop whatever methods whose cases we cannot find and defer them to later. We could do this piecemeal.

YSV: Great.

JWK: I support this proposal, but I think we should have some namespace like BigInt or BigIntMath instead of methods named bigMath or bigMin.

JSC: Okay, so, yeah. I think we talked about this a little bit in an issue on the repository. So, I don’t have a strong opinion on that. The Committee—I would love to get a temperature check on whatever they think might be best. I see there’s some arguing going on in Matrix right now. That’s totally what I want. Let’s hash this out in Stage 1 and in the issues, please. Feel free to continue talking about it in the issues. The temperature from the committee is really what I want. You know, the fact that we have both Number and Math—maybe it’s an accident of history, but now that we have both, we should try to think about what’s most logical or what would be easiest to teach. I don’t really have a strong opinion, one way or the other. Let’s hash it out. Let’s have the whole committee hash it out on the issues as they want. Does that sound fine with you?

WH: Getting into the specifics of functions, these seem fine for BigInts: abs, sign. I don’t want ceil, floor, round, or trunc. And sqrt seems fine. It produces real numbers. It would truncate the same way that division currently does for BigInts. You’d just get the floor of a square root. That’s well-defined, somewhat useful, and fairly harmless. Finally I would also support heterogeneous max and min which take arbitrary combinations of BigInts and Numbers. I would not want any of the other proposed functions.

JSC: Okay, that [max and min behavior] is an option that I hadn’t thought about—but certainly, it might be within the mental model of programmers, since they might be comparing big and some numbers directly. Anyway, please open as many issues as you wish to talk about it on the GitHub repository.

SFC: So first, as SYG and others have said, there’s a fairly limited number of functions that are going to make sense to implement here. And, in particular, many of them could return a number that has decimal places. It almost seems like, maybe we should be thinking about, you know, interaction between this proposal and the Decimal proposal that is still in progress, because it could make sense for these to return a Decimal. So I don’t know what you’ve read about it, if we should wait for that. And then my second comment: I’ll just say: one potential use case here that I’ve read about is the idea of an exact double-to-decimal conversion, which requires BigInt math, and that’s that’s one potential use case. I’m not saying that it’s a common use case because we have a lot of machinery around that already. But I know that’s something that’s come up a little bit, which could potentially be useful. That would be like one theoretical use case.

JSC: Thanks, Shane. Please open an issue with that use case, if you can find a link. And as for the other thing, whether to implicitly convert or something from BigInt to Decimal, is a big question that we should also open an issue. One more last thing.

DE: Yeah, I like the idea of starting with things that are exact calculations. So we have like, min and max. I’m not, I’m not as much convinced that things that would have rounding really make sense for BigInts…even for Decimal. These are going to be difficult to add if we went with, say BigDecimal, with precise rounding parameters…because they just become computationally very kind of taxing, I think, to ask implementations to do. So I agree with what other people said about how it’s important to start with use cases, and rather than going for being largely complete.

SC: All right. I think I’m out of time. I will contract the proposal a lot, to only involve exact stuff. I think things like square roots are a big problem, because we need to figure out what we want to do, but there’s not an exact square root. WH is raising the point about division. Yes, division already truncates on BigInts. That might be a precedent. These are all issues. Please open issues on the repository and talk as much as you want. Do I have Stage 1?

DE: I support Stage 1.

WH: I support this, with the limited set of functions.

SFC: Seems like a worthwhile problem.

BT: That’s consensus for Stage 1. So thank you for that. And great job managing your own queue.

Conclusion/Resolution

Stage 1 for a more limited set of math functions than originally proposed

Get Intrinsic for Stage 1

Presenter: Jordan Harband (JHD)

proposal

JHD [showing proposal explainer]: I’d originally hoped to ask for Stage 2 but realized that I have some unanswered open questions, that really would be inappropriate to wait until Stage 2 to resolve. So I will only be asking for Stage 1 today.

JHD: The problem here was brought up in the previous meeting, essentially that when you write some code you generally—I mean you have to assume that the environment in which it first runs is good. Meaning nobody has maliciously screwed with any of the built-ins or the environment. So everything you can access the first time your code runs is safe or as expected. So this could mean, you know, I’ve run polyfills or I’m in a certain browser or I’ve locked things down with ses or whatever. As long as it’s matching your expectations, your code is good.

JHD: However, you create functions often in this code, these run later—and by the time they run, it’s entirely possible that later run has messed with your expectations. Perhaps not in an SES realm but in most of the other scenarios anybody could have done almost anything. It could be a browser extension or advertising code or for some npm module that’s doing something to something [?] that you didn’t realize exists in your graph.

JHD: So I prefer to author my modules in a way or my packages in a way that is robust against this. One way—so I’ve explained some examples in the readme here, where, for example—this is a very contrived example where I’m using .includes on an array and .toLowercase on a string. And I’ve shown here an admittedly not-ergonomic example. How I do this: I use some packages for the call binding and stuff, but effectively it’s the same thing. I’m not attempting at all to explore how to make this more ergonomic in this proposal. That is definitely something that’s dear to my heart and I would like to happen, but this is just about making it easier to write this sort of robust code.

JHD: So currently I use a get-intrinsic package, which has fifteen-some million downloads a week, to essentially centralized access to all of the intrinsics or the built-ins, or whatever you want to call them that I need to set aside or save for my robust code. So here’s an example of code using some of these packages. This has a cost, however. Many people, many websites are using this: are shipping almost 10 kB to every browser. And some of these intrinsics, although not ones that I’m personally using, require essentially eval to get to, because they’re not accessible from the global, and they require syntax like generator syntax to get it. And so if you ship generator syntax without eval, you break a browser that doesn’t support it. So anyone that’s using CSP is unable to use any code that depends on these intrinsics that require syntax to get to. So, that’s the problem space. And since I’m only asking for Stage 1, that’s all we know.

JHD: I’m looking for consensus on whether we’d like to continue exploring this problem space. I do have a possible solution in mind, which is to provide a new global or static function called getIntrinsic, but of course we can spell that however we like. Essentially takes a string and spits out the original value being requested.

JHD: So there were some concerns brought up before about, like, there’s no snapshots of objects happening here. If you ask for the Object.prototype intrinsic, you get the real Object.prototype with all modifications that have happened to it. The real value is when you ask for something like Object.prototype [unable to transcribe] or Array.prototype.slice: Even if those functions have been removed from those prototype objects, you’ll still be able to get to them. I’m assuming that there may be some implementation cost to this, but given that many of these intrinsics are already kind of held onto by the spec—I’m assuming that there’s, you know, that is spec fiction. So, maybe there’s some different thing and implementations, so I’d be interested to hear about implementer concerns there. And then separately, I don’t believe every implementation does this. But if an implementation tracks when a built-in has been modified then they can just spit out the unmodified built-in and like directly in that case.

JHD: Okay, so I have a few open questions here. One of them is about simple properties. I very much do not want to support the ‘@@’ notation that the specification uses. Like. I don’t want to expose that to users. So I have an idea here of how to do it, where either the thing in brackets is another intrinsic, or, since this is only symbols, it has to be a Symbol.something. But that will need to be explored, and I think should be largely resolved before Stage 2. Similarly, there’s a question here about what to do about accessors. So, currently what I do when the thing being requested is a data property, it just spits it out. And when it’s an accessor, it returns its get function, because I have zero use cases in mind concretely for anything else. There’s something to explore here, you know, we’d do a property descriptor for every intrinsic, but I don’t have any use case for knowing the enumerability/configurability of a Setter right now. So like I don’t know. This is another question. I’d like to resolve before Stage 2.

JHD: And then I have another open question here. It’s just about: can we, or should we, at all restrict hosts around what getIntrinsic function they can provide. But these are all things that I’d like to know—these are things to explore within Stage 1. And I like to spend minimal time, so we can finish the timebox sonner discussing those today. So that’s my spiel.

SYG: I’ll start with the implementation concerns. So there was a bit you said about, you know, how big the user-land library is. For V8, at least the cost is, I mean, it’s not going to be free. There will be non-trivial memory cost because…while it is true that some built-ins are held onto what’s called a native context in the V8 codebase, which is basically like the thing that holds onto globals, including non-user exposed ones…While it’s true that some built-ins are there, it is not the case that close to most, or all, of them are on there. And to add a bunch of references as would be required is very undesirable. And this was the problem that last time getIntrinsic came up.

SYG: And last time there was a solution that was proposed that required architecture: which is, you know, [that] probably V8 should move to some kind of lazy-loading thing. Anyway, instead of trying to have everything on the global to begin with. And that’s still probably the best chance going forward to recoup the memory costs—to not punish, you know, every every context. The thing I would like to caution is that this approach means that this might not get implemented in a timely fashion, because the re-architecture is significant. But, that is to say, the implementation concerns still exist. Independently [the] lazy-loading thing is probably good anyway for the codebase to do, but I can’t promise any kind of timelines there. I know this is just Stage 1, but, right, I just wanted to set expectations.

JHD: And I’ll just say that if the, if implementations as a group are confident they can eventually ship it and they plan to and that they have no, you know, and as long as some implementations or implementations can ship it obviously because of the requirements for advancing stage for I’m comfortable with the personally, because once it’s part of the specification, I can build a polyfill and then that thing can just fall out of usage naturally over time. So I’m thinking about, you know, the next ten years. Not the next you know, ten months. So it’s totally fine to me if there’s a delay, but thank you for sharing that concern.

JWK: So, I have a question for an engine like XS. They need to be small. I think getIntrinsic is good. But it seems like we need to add too many strings into the engine because those intrinsics are created by the strings like %ArrayPrototype%.slice. I guess that might add too much size to the XS. Maybe we can only add intrinsics that can only be reached by syntax to get the list smaller.

JHD: I mean, so there’s a few things to respond to there. So as far as the excess concern, certainly, I’d love to hear from those implementers and confirm. But the individual parts of the dotted string all already exist in the engine. It’s just a question of, you know. I don’t know if that cancels out the concerns. They would have to speak to that.

JHD: As far as only providing the syntax ones, that’s useful. And, you know, it would be a minor Improvement. I would say at that point, we would just add those intrinsic globals rather than making a special function for them because the value of having the special function is that instead of having to set aside…I didn’t elaborate on this before: there’s hundreds of packages in a dependency graph and they are required and evaluated at different times, and It’s possible that something will require three functions and something else will require two others. But in between those, some code might run something and it sort of moves the goalposts a bit. If all I have to ensure is that the earliest run thing requires a package that saves the getIntrinsic function…Because then, instead of having to remember to cache n things, I only have to cache one thing and so their value really is only there if the function provides access to all the intrinsics that are needed. And not just the syntax-reachable ones.

JWK: Okay.

KKL: From SES and the lockdown perspective, this is great and it would be wonderful to be able to enumerate all of the intrinsics, intrinsics, especially intrinsics, or at least the intrinsics that are only reachable by syntax. Since this would give us a place to stand to harden intrinsics that are presented by syntax that are not known by a shim. And then, of course, if the language were to adopt lockdown, as a feature of the language, that would no longer be necessary. But until that became part of the language, it would be useful to have a feature to discover these intrinsics.

JHD: So that’s an interesting thing worth exploring. My understanding Is that the current only-syntax-reachable intrinsics are considered by some delegates to be a mistake, and that they have explicitly said that they will work hard to prevent any new ones from being added. So it seems like it’s a finite set that will not grow in the future. So I’m not sure if an enumeration approach is necessary, but it’s certainly something worth looking into.

KKL: No, I agree. Either invariant needs to be preserved, or this feature needs to exist.

JHD: Sure.

MM: Yeah, I just want to say that there’s something between the only-reachable-by-syntax and all the intrinsics, which for most of the intrinsics just (you know, most by total numbers) they can be reached by dotted path enumeration using get, you know, getOwnPropertyNames or starting from the Object, the ones that can’t be reached by dotted path enumeration, but are also not, but also can be reached by means other than syntax are the ones that can only reached procedurally. There is no generic way to discover them. So if you don’t know, the procedural magic formula, like “create a map and then create an iterator of the map to get the iterator prototype” and things like that. So having an enumeration that covers all of the intrinsics that cannot be reached through a generic procedure, like dotted-path enumeration, is still very important.

MM: To expand on Chris’s point, it is true that lockdown [is?] for this purpose [?] specifically. We’d no longer need the enumeration. The fact that the session shim needs the enumeration I think is a revealing symptom of the fact that, in providing an ability to enumerate, all the intrinsics are still useful, even though we would no longer need it for purposes of lockdown. The fact that they’re all exposed and reachable and code can do things with them. It is useful for meta code, for initialization code, to be able to find out what all of those are. So, for example, it can compare them against a whitelist of things it knows about and identify if there are things that appear that are beyond what some initialization shim knew about.

JHD: So I think that’s again definitely worth exploring in Stage 1. My suspicion is that that exploration might launch a different proposal, because my primary use case and motivation here is about [what] I know exactly I need, and I want it as simple and reliable as possible. I think it’s worth exploring.

MM: Yeah. I mean, I think these are naturally bundled together. It’s if—you know, an easy elaboration of your proposal enables it to satisfy more use cases. We certainly generally try for that to have something with as little complexity cost as possible. Cover as many use cases as possible,

JHD: Right. Thanks, Mark.

DE: Yeah, I think this is a really important problem space and I support this proposal moving to Stage 1. It comes up in many different kinds of base frameworks, like Node.js or Deno, or anything that embeds the JavaScript engine. I think PFC will give another concrete example. For anything that tries to implement a kind of a library in a reliable way, where those embedders were just one example, of trying to get reliable access to the standard library, the JavaScript standard Library. However, it’s pretty difficult to write programs today that use this access.

DE: So I’m kind of wondering if we should have a higher level of mechanism to solve this problem. if there were some way that you could write JavaScript code, that’s more like ordinary code, but be more reliable. I’m not sure what the exact answer is to this. Then that would seem like a better solution. So since we’re trying to design a programming language, I think it would be good to step back and think about more kind-of-radical, less incremental changes to make reasonable ergonomics. So that when you write this code, you’re not constantly avoiding landmines that might cause you errors or unreliability. I also think the implementation concerns raised are serious. I mean we’ve heard concerns from V8 and XS, and I don’t agree with the kind of idea that polyfills get us through this because I think encouraging heavyweight polyfills could make loading performance lower.

JHD: I mean, these polyfills are already in heavy usage. So it’s not that it would be encouraging and it would be, like, shifting it from a userland library to a polyfill, with the goal of removing the polyfill eventually.

DE: Oh sure. But shifting from the current polyfills to its generic implementation, this new may be more costly in terms and could make things even slower.

JHD: I would invite you to take a look at the implementation of the get-intrinsic package. It mimics what test262 does. In fact, it already has a file that gathers up as many interns as possible and stores them. So I would be very surprised if any implementation XS included was unable to beat my performance and memory usage. But I completely agree that just waving the polyfill wand does not wipe away these implementation concerns. I definitely think they need to be explored. Okay, great. Thanks.

KKL: I had avoided in my last message to bring up this topic because I do feel. It’s completely orthogonal. But also extends from this problem space and that is that the implementations, like SES and Node, I believe to this day, [are] programming defensively against modification of the prototypes and to do that you have to use uncurryThis as written today. And we do not have access to a non-polymorphic dispatch version of intrinsic methods. Like array.map which forces us to use Array uncurryThis, which is a significant slowdown that could be avoided. If we had more direct access to the intrinsic, and which I think is unrelated to this proposal, you know, as DE points, may be part of a more holistic solution to the problem of shimming.

JHD: So where you say uncurryThis, I call it callBind. And I have a package named that for that purpose. I think the bind operator we discussed yesterday and the context of the pipeline proposal about, you know, about method extraction—I think that a proposal in that space is what addresses that concern and would work quite well in concert with this proposal. So I agree with you, but I do think they’re unrelated in the piecemeal sense, and they’re related in the holistic sense that DE was talking about.

KKL: Yeah, agree with this point [?].

DE: Yeah, so it’d be great to see this kind of package of proposals laid out. So we can see a broader vision for how integrity can be exposed.

PFC: This is really important for other software that embeds a JavaScript engine for scripting for the purposes of people writing plugins. A big example of this that I’m involved with is GNOME. People write plugins for the GNOME desktop in JavaScript, and I can say from my experience that this sort of defensive programming where you have to grab the intrinsics beforehand is just not a concern for people writing these plugins—although it should be! Because you can easily crash your GNOME desktop by deleting built-ins off of prototypes. I think if we had a facility for this built into the language, that would bring it to the attention of people who don’t usually think about what happens if you delete an intrinsic off of a prototype. The fact that this facility exists makes it easier for them to think about it. I have a feeling that they’ll use it if it exists, and if it doesn’t, they just won’t realize it’s a problem.

JHD: Thank you. And as KKL mentioned as well, Node does this. They have a primordial pattern which is basically: they pre-create call-bound versions of all the intrinsic functions, and then they laboriously write all their code to use them. Not all of it, but much of it. And that’s because they don’t want the platform to crash if someone types delete Function.prototype.call. Thank you for that support. Onto immutability of getIntrinsic return values?

CZW: So I can see that about all the motivation is to get an unmodified intrinsic function like a recorder tablet. Could, however, the current proposal be clear about what values you can only get from the getIntrinsic function. So let’s say if we request an interesting built-in, like a ready prototype from getting in [?]. What if people just modify from the return value to as easy [?] just another JavaScript object and people can do whatever they want on it. Yeah, and that would make the current risk no difference if they can just touch on it. So, with what we currently have on global—

JHD: Yeah, so I agree. And I hope I outlined this in the readme, but essentially, if you get the intrinsic Array.prototype, and somebody has deleted includes off of it, then you get an object. It doesn’t have includes, right. But if somebody asks for the Array.prototype.includes intrinsic then, even if it’s deleted off the Array prototype object, you get what the original function would have been. And this matches the spec’s notation for like, when in the spec, when you type “%Array.prototype.includes%”. It says, “Grab that value as if that code was run at the beginning of the run,” something like that, “before user code is run.” So, even if someone has deleted or replaced a prototype method, this getIntrinsic function, would still give you the original one, and if somebody wanted to deny you access to it, they would have to replace the getIntrinsic function itself to do that.

JWK: This makes me think of the polyfill. For example, we need to add new intrinsics, and in the current API design, we need to patch the whole getIntrinsic function. If it is like a namespace, all intrinsics are the static properties on it. E.g Intrinsics["%ArrayPrototype%.slice"], so polyfills can add new intrinsics onto this namespace simply.

JHD: I don’t think it is, because if that object is mutable, then it defeats the purpose of being robust. If that object is immutable then it prevents polyfills from adding it. So, like, literally the only option is a function that gives direct access that I’m aware of. We can still discuss that in Stage 1.

JWK: You can run the polyfill first (to add new intrinsics) then save a copy of the Intrinsics object with a deep clone. Then you get the polyfilled intrinsics at the initialization stage that won't be affected later.

JHD: If you’re doing that, you’re already doing the wrapping, and it’s almost the exact same code for the function call, you know. And then delegating to the original function for the original intrinsics. And calling, you know, and returning your new intrinsic. So it’s I think it’s effectively the same—but I’m happy to explore that during Stage 1.

SYG: Okay, I think it might still be the case I want to say this in. And it’s optimistic [Atomics?]. It is as optimistic [Atomics?] as possible [?], but I think it still might be the case that the cost of this does not outweigh the utility though. I feel a little better about it. Now then when I put the item on given some of the other use cases that folks have said, they still seem like very niche use cases to me, all things considered. At least, this is better, at least for the web platform. Saying that they’re there—

JHD: So I completely agree, but I think more users will be impacted by this than ever are affected by Atomics, for example, which is also a very niche use case, even on the Web Platform. I mean, I’m not, I’m not hearing to be hostile with that, but I just think that like, the amount of transitive code that depends on this pattern is very large.

SYG: I think the point I’m trying to make is, I guess, the same point that I made earlier: that if a large part of this is so large, part of this is ergonomic. You want to cache one thing instead of n things. I get that and I hear you there. The other cost is you don’t want to ship this heavyweight Library around, while you might not get around that cost. Anyway, even though you push it to the engine, and that is a thing, I would like a better handle on if it, in fact, has [?] effects on loading time. That might not be acceptable if it’s just a memory and we want to re-architect around that; maybe that’s okay. Thanks.

JHD: That’s very well understood.

JWK: I suppose it to Stage 1. It’s a worth problem to solve.

JHD: All right. Do you have any objections to Stage 1 here?

BT: I don’t hear any objections. That sounds like Stage 1. Thank you, JHD. Thank you everybody.

Conclusion/Resolution

Stage 1

RegExp Feature Parity

Presenter: Ron Buckton (RBN)

proposal
slides

RBN: [slide 1] Today I’m going to be talking about RegExp feature parity. This is something that I’ve been working on and researching for a couple of months. And, if you’re unfamiliar, I have put together a website on GitHub and have been looking for community contributions as well to get an idea of comparisons between all the regular expression engines more than the limited little bit that’s on Wikipedia and what I’ve seen in other sources. This originally evolved out of when I was doing comparisons for flags for the RegExp match-indices proposal. But a lot of these ideas that I’m going to be presenting today are based on things that I’ve been considering for quite a while, in ways supporting some motivated cases I’ll talk about here in just a moment.

RBN: [slide 2] Just to provide some background: over the past few years, ECMAScript has gradually introduced new features that are present in other regular expression engines and other languages. That includes Unicode support, sticky mode, named captured groups, the dot-all mode, lookbehind assertions, Unicode property escapes, and recently RegExp matching indices, and we have new proposals that we are currently looking to expand things, such as RegExp set notation, and RegExp escape. However, we have been quickly outpaced by a number of engines and RegExp version features that are available across a large number of engines. [slide 3] So I wanted to provide a little bit of information about what’s available and what I’m currently investigating as possible new syntax to be introduced in the regular expression grammar. Again, the goal is to improve feature parity.

RBN: [slide 4] Part of the reason to investigate is support for things like TextMate grammars in-browser for web-based editors and IDEs. Most of these are web-based editors such as VSCode and Atom, and if anyone’s seen the GitHub.dev code-dev websites that allow you to directly interact with a GitHub repository directly in the browser, using the same editor support that’s available in VSCode to give syntax highlighting and colorization. The current options are basically server-side rendering using Oniguruma on the server, or using Wasm bindings. And [?] was important for Oniguruma for its advanced functionality. [unable to transcribe] Some of the other motivations behind these new features: I’m looking for ways of improving performance for regular expressions, using things such as possessive qualifiers; support for RegExp parsers that can handle balanced brackets and parentheses; the ability to document complex patterns within the pattern itself using comments; and ways of improving readability.

RBN: [slide 5] Some of the features that we’ve been investigating include things like the explicit-capture mode. This is a feature that’s available in Perl, PRCE, and .NET among the engines that I’ve currently been investigating. This affects capturing behavior, such that normal capture groups such as just those with parentheses are treated as non capturing groups and only named captured groups are returned as part of the match result. For cases where your project is primarily using named capture groups, this helps reduce memory overhead and reduces the complexity of a regular expression by dropping the ?: that’s used for what is normally a non capture group.

RBN: Another flag that we’ve been investigating is extended mode, which is the x mode. This allows you to treat unescaped white space within a regular expression as insignificant. So all white space either needs to use the \s or \ to escape a space. This is useful for introducing comments and for creating multi-line regular expressions with the RegExp constructor. There’s a couple notes here, in that Perl has the x flag, but it did not treat white spaces in a character class as insignificant in Perl 5.26. They added the xx flag, which does not enable multiline regular expression literals. The only way that you could support multiple line regular expressions currently would be to use a template literal within the regular expression constructor, for example. And this is something that’s available in Perl, PCRE, and pretty much every engine I’ve observed, with the exception of ECMAScript.

RBN: [slide 6] Other features that have already been investigated are things like possessive quantifiers, which are similar to regular or greedy quantifiers, but prevent backtracking if capture fails. This is useful for performance because of how poorly performing certain regular expressions can be. Especially those that might have a significant amount of backtracking. If you look at the discussion on the repository linked below, you can see some examples of a relatively small regular expression that takes exponential amounts of time based on how many characters are within the pattern, or within the text that you’re trying to match. One of the advantages of this is that those that are looking to achieve better performance in regular expressions would have the ability to control this behavior. This could be used in current regular expressions regardless of flag as it’s already introducing the plus character as part of a possessive quantifier, which doesn’t conflict with any existing syntax. And again, this is a feature available in almost every single regular expression engine that I’ve investigated.

RBN: [slide 7] Another feature that we’re looking into is atomic groups. These are non-capturing groups that are matched independent of neighboring patterns, so it prevents backtracking similar to possessive quantifiers and that allows you to again write regular Expressions that have better performance in specific cases. This again, has no conflict with existing syntax because ?> is currently considered illegal within a regular expression as it’s not a valid group.

RBN: [slide 8] Some other features we’ve been looking at are buffer boundaries. These are similar to the ^ and $ anchors, but in this case, they’re not affected by the multi-line flag. In most engines that support this the \A matches start of input, \Z matches end of input. Actually, I should say that all engines that have this that I’ve seen, support \z, this \Z assertion differs in at least one engine where it supports any number of optional new lines at the end of input. But most engines currently support only a single trailing new line.

RBN: [slide 9] Line-ending escapes. This is an escape character sequence that is not supported with a new character class, but it’s supported outside of character class, and it’s designed to match any line ending escape character. So it matches CR+LF, Carriage Return or Line Feed on its own, as well as Unicode line terminators. There is a PR against the repository recently, discussing whether or not this should also indicate that this should match the UTS #18 specification for \r within a character class. This just would be an escape for the capital R. That’s usually the case in every engine that’s been tested. This is a feature that, if we considered investigating, would require something like the Unicode u flag, as it would be breaking for existing regular expressions.

RBN: [slide 10] One feature that I’ve definitely been interested in introducing is modifiers. As I mentioned earlier in the motivations, one of the motivating use cases is the ability to support syntax colorization and TextMate grammars within the browser. TextMate grammars use string-based regular expressions, since they’re primarily written either in YAML or JSON and the PList format that’s also used. All three of these don’t actually support a literal regular expression, so you can’t actually provide regular expression flags to control behavior, such as whether there’s case insensitivity, multi-line, etc. Every single regular-expression engine that I have surveyed, with the exception of ECMAScript, has this capability. So it’s definitely one that I think is useful and powerful. And again, it’s heavily used within TextMate grammars today. So, some of the examples of this are being able to set, which is ? and then a series of flags; and then unset, which is a - and then one or more of those flags. That turns those flags on or off for that pattern until the closing parenthesis. So that happens for all alternatives within the pattern or the end of the regular expression itself. There’s also a variation of this that supports specifying it with a colon that uses then a sub expression. This has no conflict with existing syntax. Certain flags would not be supported with it. You would not be able to control certain flags in regular expressions, such as global sticky or the has-indices modifier.

RBN: I do want to address a couple comments I’ve seen going through here, what I’m looking for as part of this proposal. It is not specifically wholesale adoption of all of these. It’s an investigation into the individual features that we’re discussing and that I’m bringing up as showing disparity and whether we can take some of these—as I believe BT coined it when I was talking with him—as RegExp Buffet v2. Some, we may break out into individual proposals. Some, we may choose not to advance at all. But primarily what I want to do is open up discussion on all of these possibilities and features that are common so that we can determine which ones we want to move forward with.

RBN: [slide 11] Getting back into the presentation, another feature that it’s been very useful I found in other engines and other languages is the ability to introduce comments into regular expressions. Regular expressions by nature are very terse and opaque to many users. The syntax is extremely complex and as a result, it can be very difficult to understand exactly what’s going on within a regular expression at times, especially complex ones. Comments, at least in this specific feature, are designed around introducing a comment in line with a regular expression, in that the (?# symbol indicates that this is the beginning of a comment group and it ends at the next ) and allows you to write text that is not considered part of the pattern. This can be used in a regular expression literal or it can be used within the RegExp constructor using a multi-line template literal. Again, this is supported in every single regular expression engine that I’ve tested or investigated with the exception of ECMAScript up to this point. This also would have no conflict with existing syntax.

RBN: [slide 12] Another interesting feature are line comments. This is something that is supported within all engines that support the x mode flag. It’s not supported within a regular expression literal. Well, it would be, but essentially the rest of the regular expression literal would be considered a comment, because you again can’t can’t have multiple lines. It would be best used with something like a template literal, especially if you’re using String.raw, so that you don’t have to double escape your character escapes, but it does significantly improve readability for complicated expressions. When X mode is on within that regular expression, again, all whitespace is treated as insignificant and the hash character is considered the beginning of a comment when outside of the character class, which means inside of x mode the hash character would need to be escaped.

RBN: [slide 13] Another very useful and commonly used feature, within TextMate grammars, for example, are conditional expressions. They check for a specific condition and if that condition is met, evaluate the first alternative, and if the condition is not met, evaluate the second alternative. If present, this would not have any conflict with an existing syntax and is supported in quite a few of the engines that I’ve investigated.

RBN: [slide 14] The interesting thing about conditionals is there are a number of ways you can specify a conditional pattern. Essentially, the parentheses around the condition here are part of the condition itself so you can see conditions such as lookahead conditions that match the first alternative If the lookahead matches. Same thing for lookbehind assertions, and positive and negative lookahead and lookbehind. Ways of testing whether a specific capture group at an offset was matched or failed to match; and ways of testing if named capture groups were matched and failed to match. There are a couple other interesting alternatives or interesting features of conditions based on other features. One is conditions related to subroutines, which I’ll describe a little bit later—which is essentially a condition that is always false and I’ll explain what that means here shortly. And conditions related to recursion. So ways of determining, if you are currently within a recursive part of a regular expression, the recursive part of a specific capture group or recursive part of a captured group that is referenced by name. And again, recursion I’ll talk about shortly.

RBN: [slide 15] Subroutines are an interesting capability that are available in a number of engines. I have seen them used in some TextMate grammars. I have often seen other ways of doing something similar. For example, the TypeScript TextMate grammar has a variables approach, where we introduce regular expressions, then have a pre-processing step that takes these predefined regular expressions and inserts them into other ones, so that we can reuse them. That’s essentially the feature that’s available here. The idea is that with a subroutine, you can evaluate the pattern that’s defined within a pre-existing capture group at the current location. So it’s not the same as a backreference which matches the exact same substring that was matched by a previous capture group, but instead says re-evaluate this capture group. This gives you the ability to define things—I’m not sure if I have an example in here—

RBN: [slide 16] I do. Here’s an example of matching an ISO 8601 date. You can use the (?(DEFINE)…) conditional to define patterns for what a year, a month, and a day should be and then specify repeated use of that pattern within another sub pattern. So here we match the named capture group ?<Date>, and then we reference the Year, Month, and Day patterns, with or without dashes. Subroutines don’t conflict with existing syntax. These are considered currently a syntax error in all flagged modes of regular expressions in ECMAScript.

RBN: [slide 17] One of the other capabilities of subroutines is they allow recursion, in that you can reference a pattern within the pattern itself. This can be used to do things like balanced bracket matching. It’s relatively feasible today to match things like balancing single quotes and double quotes for strings, but balancing open and close parentheses or square brackets or angle brackets is difficult or impossible to do. Having subroutines with that capability and then being able to express that as a recursive operation can be useful. There are some specific patterns of recursion that are often used within these engines as well, such as the ?r, which says re-evaluate the entire pattern ?0 again, which is essentially the same. It’s re-evaluating the entire pattern, as a group, or capture [?] of 0 is essentially the entire capture [?]. This is also something that would not conflict with existing syntax and it, again, is currently a syntax error in all regular-expression patterns, regardless of flag, in ECMAScript today. And again, it’s available within Perl, PRCE, and most of the other engines that have been investigated.

RBN: [slide 18] What I’m looking to do is request Stage 1 for investigating the feasibility. I had considered an approach with some others about the possibility of creating a RegExp-specific TG. At the time, it seemed like there wasn’t enough interest in that from the folks that I was talking with. What I decided to do was put together some interesting features that I think we should pursue or investigate, based on the research that I’ve had. I expect that some of these features won’t be adopted for Stage 2. Some features might require syntax changes and some things that we haven’t listed we might consider adding. I also believe we may eventually break this down into more individual features or more individual proposals. But quite a number of these proposals have specific tie-ins to each other, such as conditionals having cross-cutting concerns with subroutines. The goal with presenting them all together was to ensure that we had the ability to see how they work cohesively. And again, a lot of these features are heavily motivated by the TextMate grammar use case, which was where I started with the RegExp match Indices. Trying to reach a point where editors like VSCode or other code colorizers or parsers in general that use regular expressions have more flexibility and more capabilities that are currently available in other engines, so that they don’t have to fall back to native bindings or Wasm builds of native engines, like Oniguruma.

RBN: [slide 19] At this point, I will go back to the queue and we can discuss any questions that people have.

WH: There are a lot of things here. Some of these I think are fairly reasonable. Some are really experimental. Some of the places where you said that these would not break existing grammar, that’s inaccurate in that they would, and I can give some examples. Some of these are really unmotivated. I don’t see much of a motivation to support multi-line regular expressions if you can’t do it for literals, and there are good reasons why you can’t do it for literals.

RBN: There’s some interesting discussion that we’ve had on the proposal repository for RegExp.escape. RegExp.escape also has a RegExp.tag that allows you to use a tagged template literal for multi-line regular expressions. So you would say RegExp.tag, backtick, and then multiline RegExp. The down side of that is it’s hard to introduce certain flags in those cases, so you’d have to have ways of making that work. You could introduce inline flags, but that wouldn’t allow you to turn on things like global sticky, etc.

RBN: [shows slide 12] So that’s something that the current approach that I have on the screen here is using: the String.raw template to allow you to avoid certain escaping characteristics. Most of the engines that I’ve seen that support this, with the exception of Perl, which has specific syntax that denotes a pattern. So these patterns can be regular expressions, but engines like the .NET regular expression engine only support the x mode in string literals, because their regexes are all introduced using string literals. There are no regex literals.

RBN: So I don’t—Calling it unmotivated is—I wouldn’t look at it as unmotivated. I’ve looked at plenty of regular expressions that could definitely use documentation and could definitely use whitespace to help make them much easier to read. It’s definitely a benefit for developers on large teams where someone comes into the project and needs to read through. And if a regular expression is introduced that has more than very simple complexity, more than just a couple of characters within the pattern, they become very hard to read. If you look at a lot of popular libraries that heavily use regular expressions such as Marked, for Markdown parsing, they break down all the regular expressions into individual subpatterns and then basically string them all together or concat them all together into a RegExp constructor so that they can have them better documented. These are things that wouldn’t be necessary with things like multiline regular expressions because they could again introduce the documentation within the pattern itself.

WH: You’re introducing a strawman which I’m not making.

RBN: I apologize. You said that multiline regular Expressions were unmotivated.

WH: Yes. This proposal only allows multiline regular expressions on strings, and so the question now becomes, how do you add comments to strings? There are places in the language where you assemble complex strings, and there are existing mechanisms for commenting on strings. You can concatenate a string together from smaller strings and add comments in between them and things like that.

WH: The other problem I have with # comments and the x mode is that they break existing regular expressions. I gave some examples in the title of my queue item here (/abc#\/x;”/ and /(?#3/4)/) where # and (?# comments are incompatible with existing literal syntax.

RBN: Yes. There are some incompatibilities with regular-expression literals, but not with the regular-expression constructor. It’s possible that x-mode just wouldn’t be viable within a RegExp literal. Or it’s possible that the syntax that we would need is more complex. As I mentioned at the end, there might be syntax changes, for example, escaping might still be necessary within a regex literal. Whereas that same escaping isn’t necessary within a regex string passed to the constructor. This is not new to how we have regular expressions today. There are certain things that have to be escaped in a literal that don’t need to be escaped within the RegExp constructor.

WH: I think I’ve seen somewhere that Perl actually guesses where the end of a literal using x mode is. I don’t want to get into that into ECMAScript. So, yeah, x mode is less harmful if it’s not in literals, but that then introduces divergence between literals and regex constructor which seems like not the ideal situation. Anyway, let’s go on to the next person.

PFC: I wanted to point out that multiline for documentation, I think is a feature that people make use of very thankfully.

MF: I’ve gotten a bit of mixed messaging here from you about exactly how you plan the process to work around this, assuming that you’re looking for a proposal to reach Stage 1 today, which is what it says on your final slide. I’m not really comfortable with calling this entire set of things a single proposal that goes to Stage 1. With a few exceptions, we typically have proposals targeted at individual features. And I would think that, especially as you’ve split them up across slides, you also consider these individual features, even though they do have cross-cutting concerns, and maybe dependencies between them. That’s not uncommon for other proposals we work on in this committee. We can treat those as individual proposals. So, is that what you’re looking for today? Just, as a committee, investigating each of these features as individual proposals?

RBN: We’ve had this discussion a couple times with respect to different proposals and where they stand on syntax and where we stand on Stage 1. Stage 1 is investigating the space. The reason that this presentation was introduced as “RegExp parity”, it’s trying to investigate feature areas where ECMAScript regular expressions are lacking. We had this discussion yesterday about things like how the \w, \d, and \s escapes differ between various engines, and that a lot of these engines only support ASCII mode in what they call “ECMAScript compatibility mode”, because we’ve been so far behind.

RBN: So, my goal was to open a discussion and start an investigation into the space of achieving better parity with regular expressions in other languages, primarily focused on the features that I’ve presented today. If it’s something where we feel that this is too broad in scope and we need to break it down into individual proposals, I’m perfectly fine with that. In which case, I could almost go back and say, “Do we want Stage 1 for each one of these individual slides?” But it’s more a matter of: I’d like to investigate the space. This is similar to, in my opinion, investigating decorators because decorators are a feature, but they decorate classes, they decorate methods, they decorate fields, they decorate accessors. There’s a lot of different things that they cover and have cross-cutting concerns and the implementations are different. So, it’s a very large proposal and it has taken time and parts of it have been broken off into other proposals over time.

RBN: But again, what I’m looking for for Stage 1 is investigating these features and other things for feature parity. This specific proposal might never move past Stage 1 because it might then be cut down into individual, more focused proposals. But what I’m looking for is the committee to agree that we should be investigating improving the syntax space within regular expressions to try to achieve better language parity, so that more things that people are already doing in other languages are portable to ECMAScript, so that people coming from other languages have a little bit of comfort with keeping those features and capabilities.

MF: Okay, I think it’s a possibly slightly inappropriate use of the Stage process here. I agree with you that this probably as a whole would never advance past Stage 1, but I do see as your overarching goal saying we will—typical Stage-1 thing—we will commit committee resources to investigate that. I think that is appropriate. Whether we actually call that a proposal or not—is up to the chairs.

RBN: I have had offline conversations with a number of individuals about whether or not we should consider chartering a technical group to specifically focus on the regular expression sublanguage. Most of the feedback that I received was, I would say, either disinterested or negative about that. But if the committee is more interested in having a specific TG chartered for this, I’m not sure what the process is to do that, but I can also investigate that as well.

DE: I want to make a process suggestion. I think a formal TG would be a little bit too heavyweight because this is an effort that we’re ramping up, then it will eventually reach a point that we’re happy with—rather than having a standing set of responsibilities forever. What if we made a regular call on the TC39 calendar? We could think of it as an ad-hoc subgroup of TC39 people who are interested in regular expression features. Then this group can propose things for Stage 1. I agree with Michael that this is a little bit of a funny—It’s more like a work area than a proposal. Maybe we could record, in the proposal’s repository, the kind of calls and work areas that we have.

DE: I’m not really on board with the decorators comparison. The decorators proposal is quite concrete in being about class decorators and decorators for class elements. Decorators for functions are not part of that proposal. So I don’t see what kind of precedent we have for calling a work stream a proposal. I think there are simple ways that we can deal with this, and we could formalize it; make it more discoverable for people to understand their work streams.

RBN: I would like to clarify that decorators were more than class decorators when we first proposed it. When JT started as champion and along with YK included function decorators, and I think even parameter decorators, which had to be cut due to complexity and postponed. So which is why I was specifically talking about that.

DE: The thing that advanced ended up not being that. But I think it makes sense that it was one. I don’t know, I wasn’t there for the history. Sorry for my inaccurate representation. But what would you think of that kind of strategy?

RBN: I don’t have an issue with it. I do want to get to some of the other replies on the queue related to this as well.

DE: Okay, one one last point. The goal would be to work towards parity, and I don’t think that’s a goal that the Committee should adopt. I think we should work on adopting useful regular-expression features, but I don’t think porting things from one language to another and expecting the features to be there itself, should be a goal of the committee.

RBN: I can agree with that again. My goal isn’t specifically 100% parity. It’s a parity that I’m looking for on common features. There are things that I have researched and here is a website that I’ve been putting together for a while. This originally started as an Excel spreadsheet and it was a comparison of common features between engines, the differences that each engine maintains. It’s not 100% accurate. I’ve been going in and filling in what I can, and I have probably about twelve more engines on my list to eventually go through and add in a lot of these features that I’ve been looking at. For example, there’s features like call-outs which I’m definitely not proposing. That’s the ability to execute code in the middle of a regular expression, backtracking control verbs.

RBN: There’s a lot of these features that I’m not looking for that are in a number of engines, but I’m definitely looking for features that have support across a significant number of engines and are commonly used in practice and would definitely improve developers’ lives. So again, not looking for 100% parity, but I am looking for improving the support we have within our regular expression language so that we can get the same types of, in some cases, brevity, or in some cases, additional power that a lot of other engines employ and are commonly used in the motivating use cases I had around: specifically things like TextMate grammar support, balanced bracket parsing, and improving documentation and readability. And improving performance.

RBN: I’ll make a quick note that recently on TypeScript, one of my co-workers has a peer from university that had built a tool to analyze the complexity of regular expressions. We were using it on our engine source code to find patterns we had that were poorly performing. As a result we have been making changes and fixes. A lot of these issues that we found would have been addressed through things like possessive quantifiers because backtracking was a significant performance problem. And instead because these don’t exist, we’ve had to rewrite regular expressions and change how we parse a number of things in the compiler to improve performance.

RBN: So a lot of these things are, I feel, heavily motivated. There are some that have somewhat less motivation. For example, buffer boundaries I found was worth introducing because it’s common across every single engine that I’ve looked at with one minor difference in semantics and is valuable; but also in many cases you can achieve the same thing if you have modifiers and can turn off multi-line mode, and then do an anchor. So there’s ways of being able to achieve this without, but it seemed worth adding. I do want to get to some more of the replies.

JHD: It turns out—I said 2017 on the queue—but it turns out, as far back as 2015, BT presented a RegExp Buffet. It was a bunch of things, which most of them I think became separate proposals—but essentially that was already the committee agreeing to explore regex changes. That’s why we’ve had many regex changes over the years. So while a TG is fine. of getting I agree that it doesn’t really make sense to have a proposal for something for this, when we’ve already as a Committee agreed and held that agreement to investigate this stuff. So I like I would be much more interested in seeing each of these things independently presented, because I think some of them are likely to be, like—I think they’re gonna have varying levels of controversy and acceptance, and I think that’s more productive than like the potentially scary thing of “let’s just like blow up the complexity of RegExp”.

CM: First of all, what I’m about to say might sound snarky and I want to apologize in advance as I really don’t intend it that way. And I’m not sure if a separate TG is the right organizational vehicle. But I would very strongly favor some way to delegate all of the regular expression stuff to a subgroup that is concerned with that stuff. Specifically so that I and those of us who really don’t care about regular expressions at all don’t ever have to deal with this. It seems like a huge fraction of the plenary’s available meeting time in the past year has been consumed by regular-expression issues of one sort or another, which seemed to me to be somewhat tangential to the fundamental issues of the JavaScript language. Now, I understand these things do need to get resolved and sorted out, and I know there is a large and active community of people who are deeply concerned with these things, but if there were some way that we could delegate that to the people who are concerned with that stuff, I think that would be an overall improvement to our process.

BSH: So first sort of opposite reasons for what CM said, I think a TG might be a good idea. I think mainly because I think there’s not a lot of clarity on what is the bar for—what should—what do we want to change? Make [?] just a regular expression language [?], which I think we need. It’d be good to have some sort of a consensus, from interested parties on what kind of features are we interested in adding, and which ones are we not? I think [there’s?] this sort of a consensus on the whole language—but regular expressions, not so much. And yeah, I agree with what was said, earlier, almost made the same statement—I don’t think parity with other languages isn’t a tractable design goal, but we don’t have any clear design goals. So, defining those would be great.

WH: We already have a subgroup working on regular expressions. Thus I am baffled by the calls to create a subgroup which already exists. The problem with the existing subgroup is that it meets so frequently. It meets every week, which makes it hard to follow.

RBN: I wasn’t aware there was an existing subgroup discussing regular expressions outside of the group discussing the RegExp Set Notation proposal.

WH: Yeah, that’s what I was referring to.

RBN: I looked at that as more of a specific feature—as a matter of fact, I had been planning, in researching this and planning, to put something on the agenda. Once I’d finished the majority of the research that I was doing right around the time, then that proposal was added. And I’ve looked at that more as being a very specific feature and scoped proposal, and again if we were considering breaking these down into more specific and scoped proposals, then it feels like that, that group would wind up expanding in its charter. Or even if it’s not really chartered, but expanding in its scope, which might not be in the interest of the champions of that. I’d have to let them speak to that.

WH: They do interact very strongly. Some of the examples you gave in the slideshow would break under the proposed modernized Unicode semantics.

MLS: So there’s the other languages and the “if they build it, they will come” kind of thing. I think we need to consider the syntax for regular expressions like we consider the syntax for the language itself, and that we syntax must pay for or the feature must pay for the syntax that uses regular expressions that are no longer regular. They haven’t been for a long time. They’re approaching Turing completeness.

MLS: I think this should be driven by developer desire. And you see this in some other languages, where they’ve added features to regular expression, then they deprecated—and other features when they weren’t, when they’re broken, when the original ones are broken or not useful. So I think we need to be very careful and drive this based upon developer demand.

RBN: And I definitely agree my motivations again, are based on where—the majority of what was presented in these slides—the motivations are based on needs that I’ve seen with in things like the Visual Studio Code editor, or Atom or any of the other editors that use Electron, that have web-based editors that currently rely on TextMate-style grammars that use syntax that JavaScript regular expressions can’t parse. And these are based on—while a lot of these are based on the common denominator between what the engine that’s being used is—in most cases that’s Oniguruma—a lot of these features are very heavily used in other languages, and I found myself constantly having to work around the fact that they don’t exist in regular expressions. And I know I don’t have a precise set of numbers of individuals with specific developer asks, but I know that a number of these features are very useful within day-to-day things. Like atomic groups and possessive quantifiers aren’t going to be used by the majority of developers, but they’re going to be used by the people that need them and have no other option. Things like conditional expressions and modifiers are extremely powerful features. Not having them means that many expressions become more complicated, which means that certain expressions can’t be implemented as a regular expression: they have to be implemented as three or four regular expressions with a lot of complicated logic around them. So the goal of any language is to improve productivity and be terse. I mean there’s multiple other goals. So a lot of these are heavily based on features that are heavily used in other languages that we don’t have.

RBN: But I definitely agree that we should focus specifically on the features that are useful. The line-ending escape one is one that I’ve seen come up quite a bit and was the first thing that somebody had a PR to improve documentation around, because they wanted to make sure it was UTS #18 compatible. Of all of the things I presented here, the one that I find the least useful, that might not make the cut, but is also the simplest, is things like buffer boundaries. I definitely agree that we want to make sure that whatever we’re building is based on things that developers need and not just everybody.

MLS: Well. I also think it’s based upon the fact that we can always find somebody that wants something, but we’re introducing complexity to regular expressions in the language. Performance. We’re also introducing complications on regular expression processing in a lot of cases. Regular expressions, there are regular expressions in applications that are used all the time. There are other regular expressions that are used infrequently. So the parsing time of the regular expression itself. These we figured out that figured into the performance of that particular expression since it’s used once or very few times. And so, even adding syntax, even though it doesn’t complexity and execution of the regular expression, needs to be figured in. And as we saw with the, the indices proposal that we didn’t think there’d be some performance implications and there were, and had to go back and modify it. Many of these, I have concerns that we will impact the performance of the existing applications that are using the current features.

SYG: I have a pretty naïve question. Because I’m not very familiar with the TextMate grammar, and kind of speak [?] into it with MLS saying about RegExp kind of almost becoming Turing complete. What I’ve heard so far is a large driver of new regular expression features is dealing with TextMate grammars. This has been the case with TextMate matching the season [?]. It sounds like it remains the case for all these new features. How much of this is TextMate running up against the expressivity limits of regular expressions? And perhaps that should change? And how much of it is “we actually should change regular expression support in JS”?

RBN: So I’m bringing up TextMate as a common use case because it’s not a standard, but it is a de facto standard in that it is supported by, I think, six or seven different editors off the top of my head. It’s supported by VS Code. It’s supported by Atom. It’s supported by TextMate itself. It’s supported by Sublime Text. And the list goes on and on. And there’s a consistent set of regular-expression patterns that are used within those grammars. And the reason for this consistency is the fact that developers really like to be able to style their editor in a way that they feel comfortable with. And if they switch from one editor to another, then it behooves that editor to support those types of grammars, and there’s very little you can do as far as changing the TextMate grammar without breaking the ecosystem that uses it.

RBN: But these features aren’t specifically geared towards TextMate support. It’s just that it is a very common use case that I see them in. A lot of these other features are very powerful features for doing other types of regular-expression parsing that just again we can’t do today. TypeScript itself doesn’t worry about TextMate. We have a TextMate grammar for VS Code, but we also heavily used regular expressions in a number of cases and again, we suffer from poor performance in regular expressions because of excess backtracking and have had to deep dive into what we’ve we’ve written to find better ways of doing this, given that we don’t have these capabilities in the language.

RBN: So, all of these features are designed for more use cases than just the TextMate case. It’s just the easy go-to because it’s one that I see very often and, well, most developers don’t look at the TextMate grammars. The text developers that I’ve talked to are usually very passionate about the themes that they use in their editors and having support for this in the language that doesn’t require essentially shelling out to another language, because we can’t can’t support these features. At least for the common denominator, features like modifiers and conditionals would be extremely useful.

SYG: So I’m not saying that I discount the use case of people who want syntax highlighting. I understand that perfectly. Well, what I’m just asking is the usual PM-ey question of how much of this is a problem with TextMate, if that is what remains the main motivating use case. I also believe that these features are very well designed to be amenable to more use cases, but I want this to be use-case driven, like MLS is saying, and if the use case remains TextMate, is it the problem? Not the “problem”, I guess…but is it more productive or easier to change TextMate? I mean we’re a standards body. TextMate is a de-facto standard, that’s something else to work with. But anyway, I think you’ve adequately answered my question. Thank you.

RBN: So while I was in addition interested in investigating Stage 1, sounds like that might not be something that we’re going to address. Instead, we might want to break this down further. I’m not convinced that a TG is necessary. The regular-expression language doesn’t change that often. It’s usually a feature introduced in something like Perl or in Python or in Java that another engine finds useful and adopts. But the majority of features that I’m proposing are features that have been stable within most regular-expression engines for four or five years or more. And there haven’t been a lot of other changes in that space. So a long-running TG wouldn’t be very useful, I don’t think.

RBN: I am interested in getting more people involved in this discussion. I definitely don’t want to end up having this overwhelm the RegExp Set Notation proposal, because I think that could bog down that proposal, and I don’t want that to happen. So I do think we need a way to have a space to discuss regular expressions for those that are interested in it, so that we can look to advance these types of features.

RBN: I feel the scope of what I’m looking at is the scope of something equivalent to Temporal or equivalent to decorators—that have taken multiple years to get where they’re at. This has the benefit of not all of the features are directly tied to each other, so they can be broken down into individual proposals. But I don’t look at the scope of this as being the scope of Intl, where there’s a lot more that will need to be done. Once these goals are met, I’m not really sure what direction to take with this. If a subgroup is what we want to do, if we want, I don’t again want to take over the Set Notation group, but if we want to set up a regular call and if I can get people involved and make people aware of what we’re [doing?], that’s what I’m trying to do. So that we can talk about these features on an issue tracking repository, I would appreciate it.

RBN: Whether that’s advancing this to Stage 1 or just having the the repo adopted into the TC39 org as is, so that we have a place to discuss this and then breaking these out into individual features again…I’m not sure what the best proposed approach is. So I’m honestly not sure what to do with this point, if anyone else has a suggestion.

WH: At some point you will want to make decisions. And the question is which group do you solicit feedback from to make those decisions.

RPR: Would you like to start by talking to the Set Notation folk? And then see what between you that you think is the most appropriate: either start your own group or expand that group?

RBN: I can do that. And at the very least the repo will live where it is and, if necessary, I’ll break this down into others [?]. And this is more of a personal reason for presenting this all at once: not having to maintain fifteen individual proposal repositories. I think I’ll leave it there, and I’ll talk with some folks offline in the Segmentation group, and if anyone else is interested, they can provide feedback on the repository where it’s at.

Conclusion/Resolution

more discussion offline

Fixed layout objects

Presenter: Shu-yu Guo (SYG)

proposal
slides

SYG: This is a proposal to ask for fixed layout objects, and I will explain that. (Or we’re calling it structs for now, but don’t get hung up on the naming at the same time.) This is mainly championed by me currently, but planning to work together with Igalia and Microsoft as well. We’ve been in some discussions there. A while ago, we also talked with Apple and KM was interested, please correct me if I’m wrong. So I put a question mark there. Hopefully, Apple remains interested: confirm that after the discussion.

SYG: [slide 2] Getting started, the big idea here is that what we’re calling “structs”, for now, are objects with fixed layout, at construction time, of the instance. So these are closed instead of open objects. You cannot add new properties to them. Those kinds of objects are attractive for a bunch of different things. They are attractive to enable shared memory, concurrency without the ability to add new things, as plain JavaScript objects allow you to do. Lets you build some other restrictions on props on top, such that these objects can actually be shared across threads.

SYG: It is attractive for WasmGC interop, and I will go into more detail for what that means. It’s also attractive for a bunch of these other use cases, like marshalling for FFIs.

SYG: Maybe we want to pack memory layout better, because we want more guarantees on how exactly the fields are laid out in the memory representation, especially if we give them sizes of the representation. Like we can say this is an into a version sixteen [?], or it’s a pointer or whatever.

SYG: Maybe it’ll give you better predictable performance because we no longer have to have the engine have to learn continuously as the program executes. What the layout of these objects are as you add and remove properties from them. It may help userland data types, like Complex and other stuff maybe together with operators.

SYG: This is a big scope that is possible to explore, but the interest of this proposal is limited to considering the first two use cases, which I already considered to be quite large. But in particular, this proposal considers the first two to be requirements and at the same time seeks to not preclude the other use cases that folks might be interested in. And for this reason, as you’ll see, when I actually get to the presentation of the actual technical parts, this proposal is intended to be pretty minimal, with a bunch of future-proofing added in. Hopefully so that we can move incrementally to enable some new expressivity sooner than later and build on it as a building block.

SYG: [slide 3] So, to motivate it better. The first one is shared memory concurrency, and I’ve given a vision talk in the past about why that is important to me, and hopefully to the ecosystem. So the basic idea is as always: Let’s use more cores, but why should we do it via shared memory versus something more principled that doesn’t have data races by construction. For example, well, the mega-apps—like GSuite, MS Office, maybe the TypeScript compiler—are running into a performance wall today. And a possible way out could be to give them concurrency sooner than later. These mega-apps and experts will need the expressivity of shared memory, even with something with more guardrails built in. I think this fits with our general approach, with our beginning, to JavaScript language’s general approach to concurrency.

SYG: And we have SharedArrayBuffers. They are very expert level, and they enable expressivity that is not possible otherwise, but most of the time there are other other things available. And in this case, I think this will be an extension of that kind of expert level feature that will be needed and there’s really no other way to get around it when it is needed. But it wouldn’t necessarily be, you know—I’m not necessarily endorsing shared memory as the best way for the average web page or JavaScript program to be getting concurrency. This kind of shared-memory concurrency is probably going to be here anyway.

SYG: Wasm last time [?] already has multi-threading. That’s the main reason. I mean, they want multi-threading to enable their use cases and that’s a big reason [?]. We did show a revolver [?] and we have a common foundation now and that’s great. And WasmGC is [?] going to add the ability for Wasm to create structured data, but they also call structs and, in n years—I don’t want to really give any predictions here…but I think the writing is on the wall that, in some number of years—once WasmGC gets through its standardization process and its implementation shipping…The next thing they will turn their attention to is also multi-threading for WasmGC objects. You know, they need to be able to compile Java. And so this kind of currency is probably going to be here by WasmGC. Anyway, sometime in the future.

SYG: [slide 4] Why not just SAB and Atomics? So, do you want to share structured data? You need wrappers, and wrappers have some pretty fundamental issues around performance. Like, the only reason you want to share any data at all is because you’re having performance problems. Otherwise, you should just do everything in a single thread because it’s much easier to think about. But if you want to share data, you can technically do that today with SharedArrayBuffers and synchronize stuff with Atomics. But with wrappers, you have to create wrappers. That’s it. That’s proportional to the number of objects, your object graph. And if you recreate these wrappers per thread, because the wrappers have to be thread-local, the only thing that can be shared is kind of the payload in the SharedArrayBuffer. If you’re recreating a linear number of wrappers across threads, you’re gonna have a bad time with performance. So, kind of defeats the purpose of why you’re sharing stuff, again.

SYG: You have to manually choose a representation of the objects, right? You have to manually layout your objects in the SharedArrayBuffer because it’s just bytes. And you have to manually kind of sync up the management of those bytes in the buffer with the lifetimes of your wrappers and that’s also bad.

SYG: [slide 5] So I see this as a continuation of the SharedArrayBuffer story. Back when we were doing SharedArrayBuffers, often a big motivation was, “Well, game engines are going to do this.” The players have changed and that they were game engines back then, and now it’s these mega-apps that I think that what not. [?] I think that I know, needless and higher level shared memory, concurrency. I think it is a power we need for better products. And just like how we drove SharedArrayBuffers to be the shared foundation for shared memory between JS and Wasm—and succeeded there—I think we can do it again at a higher abstraction level.

SYG: [slide 6] So that segues into WasmGC. What is WasmGC at a very high level? I’m not going to go into it here, but at a very, very high level it’s just the idea for Wasm to be able to hook into the host’s garbage collector and allocate fixed layout objects that they can use within Wasm. These objects have opaque memory, meaning that you cannot access them by their linear memory. Wasm are just objects like JS objects. You know, unlike C, we can’t take the address of some JS object and look at the bytes. And similarly we cannot take the address of some WasmGC object and look at the bytes in the linear memory of Wasm. So that’s a pretty big proposal and it’s in flux and we’re still designing it, and I think we should take care in that we have a responsibility to the ecosystem to ensure that WasmGC and JS work well together there. And it’d be a shame if we ended up with similar, but different features. And it’d be the worst shame if we ended up where to use these classes you see [?] from within JS, you have to create wrappers around some kind of background story [?]. For the exact same reasons why wrappers are a performance problem.

SYG: [slide 7] So there are shades of what I mean by interop. I intend what I mean by interoperability to be “works with”, not “we have the exact same feature set”. That is expressible in both lasting PCAs [?]. And the goal of this proposal is just the upper left square in that table. I want to build common machinery that instances of Wasm structs are usable and explainable as JS structs. There are other squares to fill out there, for fully fleshed interop to find, but I want those to be out of scope for now.

SYG: [slide 8] A pretty legit question is, “Well, WasmGC is not done designing. Is this too early to talk about the JS feature as the WasmGC side of it?” Sorry. That’s the JS to WasmGC [?]. And I think “no” [?], in the scoped sense [?] of this proposal is that…It is a minimal proposal that seeks to add common machinery to explain stuff and seeks to be compatible with all possible WasmGC futures not done. The flip side of that is that in the general sense, It is too early to talk about the WasmGC because a lot of stuff is subtle, like the type system for want, some juicy [?]. And that we need other companion proposals in the future. But at the very least, we need this common machinery, which is what this proposal seeks to do.

SYG: [slide 9] So again, other use cases should not be precluded, But I do want to explicitly call them out, as being like how to [?].

SYG: [slide 10] So now we’re at the point of this presentation, where I’m going to go into the actual strawperson, or a conceptual part of some of the technical bits of what I’m thinking. And if you find yourself having questions about why some of the design choices are what they are, they might be explained by some of these themes. So the MVP proposal here is all about the semantics of instances, not about the semantics of constructors or prototypes. The second theme is that, when we get to the sharing part of the invariant of the design here, non-shared stuff that’s local to your thread can reference shared stuff, but not the other way around. And this is an important invariant to make proposals in this space more easily implementable. If shared and non-shared stuff freely reference each other, You have cycles in shared and nonshared heaps, and that precludes certain implementation strategies where you might want to separate your shared and non-shared heaps in the implementation. And the reason why you might want to separate them is probably just because JS GC has been single-threaded forever. And that is how your GC works. And if the task here is for all the engines to “please completely rewrite your GC”, it’s not going to happen. And the third theme is that this is very minimal. And that’s why it might not look very ergonomic. And I have a request to hold off on syntax discussions. I actually think this proposal, the current, that the technical bits of this proposal could be done with or without a novel approach to doing novel syntax, but [?] with syntax. So there’s some very rough syntax proposal here to use for the presentation explanation.

SYG: [slide 11] All right, so the technical proposal here is we plan to add fixed-layout objects called structs, and there will be two kinds of structs. You will have plain structs which are just like sealed objects, except they have these additional semantics that all of their instance fields are initialized at construction time in one shot. And then we have shared structs, which are new expressivity, which share these one-shottedness semantics and the sealed semantics, but have additional restrictions so that they are able to be shared across threads. So, for shared structs, for the ease of presentation, this is a contextual keyword in front of the class expressions called structs.

SYG: [slide 12] So the idea is that these basically look like class expressions, except we have this extra, contextual keyword. And what happens is that, things that are declared as structs when they are constructed, they get all of their declared instance fields initialized to undefined in one shot at the beginning. And then after that their instances are sealed. So you cannot add new properties to them. You cannot change their prototype. They can have inheritance, and their super in the superclass chain must basically all be structs. But otherwise, they’re like plain objects.

SYG: [slide 13] And this is an inheritance example. You can have structs that inherit from other structs, but they kind of inherit this one-shottedness. When you create a D in this case, you get f1 through f5 at the same time in one shot. There is no “partially” in partial initialization, unlike with current classes. You can observe [?] a partially initialized [?] instance. It’s a partial [?]. Structs are pretty easy.

SYG: [slide 14] Shared structs is the new expressivity. So I think a good way to think about them is that they’re like plain structs with more restrictions, so they can actually be shareable. Like plain structs, they are sealed and one shot initialized, but they are very restricted in what you can actually declare in them. You can only declare currently a constructor and instance fields. You cannot declare methods or accessors, instance methods and instance accessors, because sharing functions, sharing code, is a harder problem that will need to be solved. But I would like to be solved in a companion proposal because this is basically, you know, challenging enough as it is. I don’t want scope creep here. Static fields and methods are probably okay. The constructor itself isn’t shared.

SYG: [slide 15] Additionally, the kinds of things you can put into the instance fields are restricted, so that you can actually share these instances, so they can reference all primitives except symbols. They can reference other shared structs, but they cannot reference local objects. So this is the invariant: that shared things cannot point to non-shared things. You cannot put the array [1,2,3] in there, because that’s a thread-local array. This may be too un-ergonomic to actually be, you know, readily usable, and something like SharedArray or shared fixed-length arrays or fixed-length lists of things might need to be brought into scope to make this usable even for these expert-level mega-apps that I was talking about earlier…but that is TBD currently. It’s easiest to say we cannot share local objects. And I say as another point of future-proofing, if you touch the [[prototype]] slot at all, it throws. We have ambitions to share the prototype chain and to give these things prototype chains once we know how to share functions. But until then, the high-order bit here is to future proof for that work. And one way to future proof is to make touching the prototype throw. We could also say in the beginning, that these prototypes are always null or something, but that can be discussed later. I think that’s a technical detail. But the point here is to future proof against the world for a world where we have prototype chains on these things. Just not now.

SYG: [slides 16–17] And the main expressivity that you can do with these is that you can share them across worker boundaries (or “agent boundaries” in Ecma 262-ese). When you share them across the worker boundary, they keep their identity. You are not sharing their backing store and getting a new wrapper object on the other side, unlike SharedArrayBuffers. You’re sharing the actual instance. And you can easily access their fields, such that if you write to them, concurrently through multiple threads, you don’t necessarily know which will win, but there is a base-level guarantee that your write won’t tear. Like, you’re not going to write half of a value, like half of a number or half a string or something. So, in this short example, the idea is that the program can print any of the following interleavings. The main remains main, worker work remains worker, worker [?].

SYG: [slide 18] So, if the main interesting things are our shared structs, what are plain structs for? So I think there is some value in having a declarative sealed one-shot-initialized object. And maybe somewhat more predictable for performance. But I think if I’m being honest with myself, the main thing that we’re enabling by carving out plain structs now is to future proof for these other use cases that I listed at the very beginning.

SYG: [slide 19] So, if you have been around with JavaScript language evolution for a while, you might remember Typed Objects, and this is a departure from Typed Objects in that Typed Objects had these sized field requirements. Where when you declare a Typed Object you have to say, this field is like an int8, int32, to the point that they had these representations not to [?]. I don’t mean types as in complex, sophisticated type systems. I really mean representation and size and alignment. The realization was that for the core use cases here—of shared memory concurrency in WasmGC—we don’t have to have representations and types on the fields for concurrency. All you need is alignment and width fields. You can just say they need to be pointer aligned or something. For listening [?] to WasmGC, even machine types aren’t strictly needed either. Why: WasmGC structs that are reflected into JS are free to perform whatever additional type checks they need, without having to reflect what kind of type they are into JS, just like how Wasm-originated ArrayBuffers have these additional behaviors when you try to use them. So to keep it simple, we don’t have types, and it’s also a smaller delta from plain objects that way.

SYG: [slide 20] Some implementation guidance: how you might implement all this stuff. The hard work is probably not the semantics—which I think are, hopefully, fairly [easy?]—but to make time to try to convince you and the other implementors is that it is in fact implementable to begin with. And the first bullet point there is about the invariant that shared things cannot reference, local things, only local things can reference shared things. And that is so that this can be designed to be implementable with either separate heaps or unified. If your engine already has a garbage collector that is parallel and concurrent with the mutator that is capable of doing all that stuff with a unified heap, that’s awesome. you know, you’re basically done. But most of us probably don’t have that. We have two separate heaps; V8 has isolates. SpiderMonkey has something I’m no longer up to date on but last I checked it was not this kind of unified heap design. I do not speak to JSC but I know Filip Pizlo [FP] and co. are super GC hackers, so they probably have something very advanced already. The hidden class of these structs is because they are fixed layout. They can be immutable and shared among the different agents when you declare structs. I like normal classes, normal objects, external objects [?] are open. You can keep adding stuff to them. You cannot just lock down their shape can do in class. And for shared structs, the fields need to be at least pointer-width and pointer-width aligned. So when you access them at the machine level, you have some base atomicity guarantees.

SYG: [slide 21] The stuff that’s going to be hard is obviously the garbage collector, and there are trade-offs there depending on whether you choose separate or unified heaps. I’m not going to go into detail here. If you’re a GC implementer, you’re probably well aware of these issues.

SYG: [slide 22] The stuff that’s really hard are strings. All the engines have these very complex menageries of string types and string optimizations, such that the string representations mutate in place depending on when things happen. When you flatten ropes, for example, when you concat strings, they get into these rope structures where you don’t actually just copy them, you hook them up into a DAG—but sometimes you need to access the character buffer and when you do, you flatten them. What happens when you flatten these ropes [?] transitions in place to a flat string? Sometimes you cannot apply them, AKA intern, where to duplicate them [?] so that you can compare strings that are duplicated by pointer equality. This gets inserted to a table; that table now needs to be thread-safe when that representation happens in place. Sometimes you even externalize strings, where you move the ownership of the character buffer out of the JS engine into the host, like the HTML engine or something. It’s pretty hard to make these thread-safe and performant. It’s a major challenge. I’ve been working on it for a few months. It’s kind of fun, but it’s actually really hard. This is just to call that out.

SYG: [slides 23–24] And yeah, that’s basically it for the motivation and very rough idea of what the technical solution might look like. And I would like to go through the queue and then ask for Stage 1 with details of what exactly I’m asking for on the right-hand side here.

KM: I’m still still on board with this. Don’t know what happened. But yet he didn’t get back to you in time. But yeah, now it’s I still I’m a fan of the idea. And I’m happy to co-champion.

SYG: Awesome. Thank you.

YSV: So, I’m very much in favor of this work taking place. However, we did have some concerns on the SpiderMonkey side. In particular the effect of using an “any” type without being typed. And you have a slide here that actually addresses this. But this “any” type effectively means that it’s more of an analog to what would be coming from webassembly than what we would actually represent. Now, I believe that ATA knows more about this. He’s been working on the SpiderMonkey [unable to transcribe] and was familiar with it. Would you be able to speak to this?

ATA: Yeah, it is concerning. So in two slides, there was a slide about what’s in scope in terms of the direction of the interop here. So I think at this point with very minimal proposal, the intention is that the interop would mainly be about explaining how these WasmGC structs get reflected as as JS object, and I think where a lot of the type system issues come in is where we want to do the import from a JS [unable to transcribe]. And in particular, I think there’s also a difference between, you know, when we talk about round tripping. So if you have a Wasm struct that goes to JS and back, that’s probably possible, but if you were to create something entirely in JS, then import it into Wasm, then you have to have some type information for this to be possible. And so I think that’s where these future extensions would fit in? And that’s I think where this issue of having types, which come in if I understand your concern. Does that answer that?

YSV: Yeah, that does more or less answer it. Now I can’t speak for our Wasm team, they have very limited time in terms of giving this proposal the amount of time that it needs. So they haven’t been able to come back to me with regards to any further specific issues they have here. But what I would like to say is I’m perfectly fine with going to Stage 1. I want to spend some time working with a few people not only on the direct Wasm team, but also adjacent to it. So ATA would be one person, but I also want to speak with Luke Wagner and a couple of other folks to get a bit more feedback here before we would look at something like Stage 2. The WebAssembly team on Mozilla’s side, on SpiderMonkey, is a little uncomfortable with how quickly this is moving forward. And I will try to get some information soon, but I can’t promise how soon that will be. So if you’re looking to move this quickly to Stage 2, I would ask that we work closely together on the pacing of it.

SYG: I hear you and I intend to work closely with you. To the urgency question and the speed that I’m envisioning. This slide [slide 22, about strings] is I think the actual thing that might block implementations for a significant amount of time, and that is what I’m working on, and that is not blocked by, you know, standards progress. I think I want to get the ball rolling here. There are many interested parties and let’s try to nail down some design. That is amenable to everybody while this part which I think is the hard part is happening.

YSV: Yeah, for sure. For this, I’m going to have our GC folks take a look and work with you on that once they’ve got some cycles to do that.

JWK: Currently on the web, the concurrent programming model is based on post messages instead of memory sharing. Adding a high-level abstraction of shared structs… That means we are encouraging the concurrent programming model based on memory sharing.

SYG: I will say no, so one answer there is that SharedArrayBuffers exist. So we already have shared memory and the other part of the answer is that—

JWK: SharedArrayBuffer is a low-level API and it’s hard to use.

SYG: I think, at least right now, this is also fairly hard to use even with all the bells and whistles. I imagine that will need here for these to be more ergonomic for power users like function sharing. Opting into this kind of programming is just hard to get right. I’m not seeing the encouragement where, if the encouragement from their syntax were, “Now you can make these objects,” they will run into issues pretty quickly. It is a risk that we might be encouraging a dangerous style of programming, but escape hatches exist. I remain very convinced and I feel strongly about this: escape hatches for these kinds of power app experts [partners?]. Pressure will remain on that front, and this is for them. If you can refer back to the vision talk I gave about concurrency in general a year ago—I think the future of concurrency on the web is we need to own up to having just these two concurrency models. This message passing thing, that’s mostly done by race-free construction and shared memory. And it happens. We’re doing shared memory first, but the longer-term vision I have is not this being the primary way to get concurrency on the web where the GSM [?] system, but it is a building where I imagine that we can explain other kinds of objects that can be shared among threads in a safer manner.

JWK: Okay, I think it’s fair too. JS should be able to support multiple patterns (like FP and OOP).

MM: I’m very, very skeptical of this entire direction. The non-shared ones, the struct classes: those actually look very nice for reasons that you didn’t go into at all and seem to be completely outside of your motivations. They actually share a lot with what I was trying to accomplish with defensible classes, and I think you’re succeeding where I wasn’t able to figure out how to succeed because you actually got more restrictive than occurred to me, like the fact that they can only inherit from struct classes and that they're initialized all at once. There’s no partially initialized state that’s visible. So, that’s all great.

MM: On the concurrency, on the shared things: I think that this is really about the soul of JavaScript, as a character of the language, and what makes it something that lots of regular application developers are able to use successfully, including using JavaScript’s concurrency successfully. The concurrency, like JWK was mentioning, is the message passing concurrency.

MM: I remember when Java first came on the scene and lots and lots and lots of application programmers thought: “Oh, okay. Now I’ve got shared memory in a language, I can use it and oh, yeah, it’s tricky. There’s all this computer science stuff, but I’m clever. I know how to use it!” And people just made a terrible mess and that’s happened over and over again.

MM: With Go, Go was largely seen as an elegant approach to concurrency because of its support for CSP-style message passing. However, they did the goroutines in such a way that you could share memory and people used it. And once people start using it and it’s higher performance, you can’t quarantine it anymore. Then what happens is you accumulate libraries that use it and then other other code is now in bed with those libraries and is dealing with the non-local aftereffects of dealing with shared memory, the compositionality problems of either being racy or deadlock.

MM: There’s no good solutions to those things. Shared-memory multi-threading is a really horrible concurrency paradigm. This message passing can be made better. Many of the things that you have raised in committee are very nice directions for preserving the safety of message passing while getting higher performance out of it. Some of the things that Moddable have done, where you can share transitively immutable objects, effectively within one realm across threads, while still having complete partitioning of mutable state. Some of the things that we’ve explored in committee previously, having to do with data parallelism. There’s lots and lots of ways to get speed up without completely destroying the safety that shared memory destroys.

MM: And the argument that experts will use this, and regular users can choose not to, just doesn’t hold once there’s an ecosystem. And people are trying to use some high-performance libraries that were constructed by experts to use these features. There is a contagiousness of complexity on the code that just tries to make use of those libraries. So none of this is an argument against Stage 1, you know. Certainly as for Stage 1, I’m fine. But I want to make it very, very, very clear: I really hope we don’t introduce this level of hazard and footgun into the JavaScript language that will really destroy the character of the JavaScript application program.

SYG: Thank you, Mark, for your perspective. It’s somewhat of a philosophical disagreement. We’re perhaps less misaligned than you might think. I think I want the same future you want. Except I don’t see a way around escape hatches. and we can discuss that offline to see how we can further restrict these. I’m operating also under the design principle that shared memory stuff must be very explicitly opted into. And this contagion, I share that same concern but this contagion I also feel will be here in an even worse pattern If we do not get ahead of this, in the sense that we did with SharedArrayBuffers by WasmGC.

MM: I was reluctant to approve SharedArrayBuffers. And the reason I approved it is that the pressure from games made it seem like it was inevitable that whether TC39 approved it or not. All the browsers were going to implement it and games were going to use it. And then far, the reason that we’re still in a good place is basically because SharedArrayBuffers has been a resounding adoption disaster. People don’t use it. And hopefully they will continue to be an adoption disaster and anything that makes shared memory multithreading usable will make it more adoptable, which will be a strict backward motion from the current state where people could use it and destroy safety properties, in theory. But right now, at least they’re not.

JWK: One of the primary use cases for shared memory and shared structs are for WebAssembly, WebAssembly needs a shared struct because they need to handle the code compiled from C++ or some other languages. I think it’s acceptable to keep the shared struct inside multiple Wasm threads, but not let them leaked into the JavaScript side. Multiple Wasm threads can program by the shared memory and if they want to send the results to JavaScript, they need to go through the message passing. I think that is better to have.

SYG: I disagree and I think real products would as well.

KM: I don’t know, that seems very unlikely just because WebAssembly as a standards body could just define an object who has a get and set operation, that does whatever they want. It can access anything. Like it wouldn’t even be necessarily in TC39’s control, if they decide that they wanted to ship something like that. I don’t know, it seems like, you’re just being even [able to come?] outside of this standards body to me to say it’s just not possible. I guess we could say in some note that it’s not allowed, but then you have a conflicting thing where someone else ships some standard that says that it’s required. I don’t know how that would work.

SYG: We’re out of time: two minutes. Unfortunately, the memory model question, if I can anticipate it, WH, might take longer than two minutes. Could you ping me personally and we can try to hash it out?

WH: I’d like to ask a question: In your slides when you access that x field, are all of those accesses atomic or not?

SYG: Do you mean sequentially consistent? Or do you mean memory ordering?

WH: Memory ordering.

SYG: Yes, they are atomic in that they won’t tear, but they are unordered. The current intention is to also extend atomics in this way. I didn’t show this level. If you need GC access, you can do this.

WH: Okay, in this case, I do not believe that this is safe.

SYG: I think it is, but let’s check.

WH: Okay, then the problem is—the reason this works with SharedArrayBuffers is that when you get paradoxes caused by relaxed ordering, you just read or write some bad integer. On the other hand, with structs you don’t just get bad integers here and there when you get these hazards, you might get access to an object which has not been initialized yet. So this makes this unsafe, unless you make all accesses sequentially consistent.

SYG: If that’s what needs to be done, then that is a direction. I had been thinking about it, but I thought unordered by default might be okay, but admit having not fully worked out the memory model.

WH: You could make all accesses sequentially consistent, but then it would be too slow.

SYG: Okay. Sorry, [where was?] I? Let’s continue this chat. This needs to be worked out. But not before Stage 1. I would like to ask for Stage 1 explicitly to explore the space of fixed layout objects for the following two use cases of shared memory and WasmGC. Any objections to Stage 1?

JWK: I like the non-shared parts, but the shared parts are skeptical. I think Stage 1 is okay though.

WH: Yeah, I do not believe this can be implemented efficiently for the reasons I stated, but you’re welcome to explore it.

MM: Yeah, I reluctantly do not object.

RPR: Okay, I did hear one positive from DE there and another positive from LEO and a few skeptics that are not blocking. So I’ll conclude that we have Stage 1, congratulations.

Conclusion/Resolution

Stage 1

Resizable buffers

Presenter: Shu-yu Guo (SYG)

proposal

SYG: It's just an FYI of a normative bug that we fixed in the Resizable Buffers proposal that my teammate Mario found during implementation. Resizable buffers allowed the buffers to be resized. So It is possible that you resize the buffers such that the typed array view on top becomes exactly at the bounds that you resize it to.

SYG: So the normative issue we found is that, when you resize and underlying buffers, such that the view becomes zero length, where the bounds of the view on top kind of CIS [?], exactly at the bounds of the underlying buffer, this the spec draft was throwing out of bounds. For a variety of reasons you can read on the issue here, this didn't make as much sense as I had thought. We already allowed zero length. Like the race to begin with. So this is a this is a very small change to basically change a “≥” sign to be a ">" sign such that these kinds of these particular, kinds of typed arrays considered in bounds, even though they have a length of 0 and they don't throw when you should have access them. Because the current idea is that out of bounds raised on top of resizable buffers behave like typed arrays with detached buffers and making these kinds of zero-length typed arrays behave like detached buffers is undesirable. Any concerns with this change?

RPR: You have consensus.

Conclusion/Resolution

Consensus reached

Incubation call chartering

Presenter: Shu-yu Guo (SYG)

SYG: We actually worked through the backlog of chartered incubation calls from meetings, that we have an empty charter right now. So before I nominate some early stage proposals, does anyone with an early stage proposal want to have an incubator call? For the newcomers, incubator calls are our calls that happen bi-weekly at different times, depending on their scheduled time according to the stakeholders, where we try to get a faster feedback loop between the champions and stakeholders within TC39. We have these calls, where the champions preferably ask for feedback on specific items about the designer concerns of the proposal, and you hash them out in a high-bandwidth setting in a call outside of plenary. The idea is we give some sanctioned time so we free up plenary time for more important stuff. Any interested parties?

LEO: I want to add something. That is not exactly a proposal but probably interesting to talk to TC39 about, especially from the likes of implementers, about proxies. I think the usage of an incubator call is actually better than coming to the plenary with all of that. So it will be about proxy performance.

SYG: I would like to prioritize proposals, if there are any. I don’t know if GB and—

DE: Well, I had another non-proposal that I was going to suggest, which is WebAssembly in JS, interaction topic. I had an agenda item [that] we did not have time for, this meeting. So I took it off the agenda. I’d really like to get more broad involvement from TC39 and input into the WebAssembly-JavaScript API. It’s starting somewhat lower priority and or just lower attention from the WebAssembly CG, just based on the makeup of that group. They definitely care about it, but I think they want input from JavaScript experts here. So we could have a one-hour incubator call. We can discuss that.

SYG: Sounds like a good topic. And the one I was planning to call out, if GB and BFS are here, is the well-formedness of strings [?]. It seemed like there were some possible design directions that you might want to get feedback on. Are you going to do? [?] I write these here that I can.

GB: I can speak briefly to that. Could certainly be worthwhile if there’s things that can be fleshed out. I'd certainly be open to that, and also to DE's point for the WebAssembly-and-JavaScript API. We’d be really grateful to you.

SYG: Thanks GB. I think given our faster cadence we usually have realistically just time for two calls—so with proxy performance and Wasm–JS interaction—that should fill out the time until the next plenary, in which case we can put strings or the well-formedness of strings [?]. If you’ve got a rat [?] interested. Thank you. Look out for the new charter [?] and sticky [?] stuff from the Reflector, scheduling the calls.

RPR: Thank you for running these incubation calls. I think they’ve been very successful at lightening the load on the plenary, which has been really good this year.

Conclusion

RPR: We are complete. We got through more items than we originally had planned. Thank you to everyone who got through things earlier than their time box. It’s the end of the meeting.

[chat]

RPR: I will also apologize that I was due to provide an update on scheduling next year. I didn’t get time to prepare the slides on that. I will say that we’ve taken the feedback into account and the one thing I say we are looking to do for next year’s schedule is to reduce our eight meetings to six meetings. You can see the feedback on that is all in the spreadsheet.

[chat]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sept-01.md

sept-01.md

1 September, 2021 Meeting Notes

BigInt Math for Stage 1

Conclusion/Resolution

Get Intrinsic for Stage 1

Conclusion/Resolution

RegExp Feature Parity

Conclusion/Resolution

Fixed layout objects

Conclusion/Resolution

Resizable buffers

Conclusion/Resolution

Incubation call chartering

Conclusion

Files

sept-01.md

Latest commit

History

sept-01.md

File metadata and controls

1 September, 2021 Meeting Notes

BigInt Math for Stage 1

Conclusion/Resolution

Get Intrinsic for Stage 1

Conclusion/Resolution

RegExp Feature Parity

Conclusion/Resolution

Fixed layout objects

Conclusion/Resolution

Resizable buffers

Conclusion/Resolution

Incubation call chartering

Conclusion