Template objects are eternal when put into WeakMap #840

syg · 2017-03-07T18:13:24Z

As with all fun things this came out of a discussion with @anba and @arai-a. We ran into it as I was fixing template object canonicalization in Firefox.

So GetTemplateObject canonicalizes template objects using a realm-wide template registry. I assume the intention is to allow this registry to be implemented internally as a weak map: if the template object itself is GCed, then the identity of the canonical object becomes unobservable, and thus can be removed from the template registry.

Funnily enough, using the template object as a JS WeakMap key makes it observable forever. Since from the spec's point of view the template objects are eternal anyways, the spec calls for the following behavior (test case courtesy of @arai-a):

let map = new WeakMap();
{
  let obj = (x => x)`foo`; // new template object is created
  map.set(obj, 10);
}
gc(); // template object will be collected if it's weak
{
  let obj = (x => x)`foo`; // new template object is created
  assertEq(map.get(obj), 10); // fail
}

In other words, putting a template object as a key in a WeakMap actually makes it live forever and unable to be GCed.

@ajklein @bterlson @msaboff How do your respective engines implement this?

allenwb · 2017-03-07T20:01:41Z

template objects are canonicalize so that template handlers can use them as cache keys (possibly weak) to memoize any preprocessing of the template object.

I assume the intention is to allow this registry to be implemented internally as a weak map: if the template object itself is GCed, then the identity of the canonical object becomes unobservable, and thus can be removed from the template registry.

Not necessarily. The original idea was that template objects corresponded to a specific "site" in the source code and that any evaluation (for the duration of its realm) of the template literal at that site would always produce the same template objct. That's why the metavariables that holds the result of calls to GetTemplateObject are named siteObj. GetTemplateObject and the registry it uses were just a specification mechanism for saying that each evaluation of template literal at particular source code site must yield the same template site object.

At some point we (presumably @erights and me, but I don't recall) must have realized that because the "site" template objects were immutable, the same object could be used by different template literal sites within the same realm (same realm because they reference %ArrayPrototype%) and so the registry doesn't include the parse node ("the site") as part of the key. Unfortunately, that in combination with the ESs ability to dynamically load code implies that once a template object is created in a realm it must exist for the remaining lifetime of the realm. I'm not sure whether that was in fact the original intent when we were thinking of template objects as being site specific.

erights · 2017-03-07T20:36:41Z

Yes. Given evaluators, there were ambiguities regarding the definition of "site". Rather than resolve these, we decided that all identical template object contents within the same realm be made identical through a realm-specific registry. Had the identity been specific to "site", the identity needs to last as long as the possibility that the "same" "site" be able to be mentioned again, for whatever the concept of "site" is. This emphasizes why the "site" concept is tricky to get right.

The realm is a good replacement for site. Since the template object inherits from some Array.prototype intrinsic, clearly nothing coarser-grained than realm is possible.

I'm sorry if the issue of what-is-weak got confused in the process. I may indeed have been confused about this during the transition. Here's how I see it now:

The per-realm registry must indeed hold onto template objects strongly. It is a strong interning table. The important weak maps we were always trying to enable, and still are, are the weak maps that the template tags can choose to use in order to memoize any preprocessing work it does on a particular template object, so it can amortize that work over all the substitution values that template object is reused with.

My own https://github.com/erights/quasiParserGenerator is a good example: The bnf is part of the literal template text. The action rules are represented by first class functions as substitution values. From one evaluation of a bnf`...` expression to the next, the literal bnf grammar is always the same but the action rules can differ. All the work generating a parser from the bnf text is reused even though the resulting parser may be used with different action rules.

For present purposes, a strong per-realm interning registry does enable working weakmaps per tag for memoizing any per-template preprocessing, for amortizing effort per template expression evaluation of the "same" expression, for a workable definition of "same".

However, the quasi-parser generator also demonstrates that per-realm interning is too coarse for a completely different reason I'll explain in another post. That reason does not affect the conclusion that the per-realm registry should be strong.

ajklein · 2017-03-07T20:50:29Z

@caitp wrote the v8 implementation; a cursory examination suggests that we hold onto the objects strongly (we use a JS Map as the storage).

erights · 2017-03-07T20:52:56Z

Why is a per-realm interning based on only the currently specified contents of a template object too coarse?

Image that a template tag function is a quasi-parser for an interesting dsl. The https://github.com/erights/quasiParserGenerator is a fine example, where the dsl is bnf[1]. Now, let's say that a given bnf`...` expression itself has a syntax error -- it is not a syntactically valid bnf specification. The bnf tag must certainly reject it, and do so with an error message for helping the programmer to find and fix the syntax error. Of all the parts of a syntax-error-diagnostic, by far the most useful part is the source location.

Given the current template literal specification, it is IMPOSSIBLE for a template tag function to figure out what source location to tell the user about.

Instead, we need, as another immutable part of the template object, a source map mapping from locations in the template back to source locations. But what is a source location? Both the de-facto source map draft standard and the error stacks proposal traffics in source locations, but neither tries to pin down what they are. Whatever they are, this means that the per-registry strong interning table should take these into account as part of the interning. This will make finer grain distinctions among templates in the same realm than can be made within the current spec.

Attn @disnet

[1] Actually, parsing expression grammars rather than bnf. The difference is irrelevant to the present point.

syg · 2017-03-07T22:16:04Z

I'm not particularly please about speccing something that in practice un-GC-able. Nevertheless, thanks for verifying that the registry must be strong.

erights · 2017-03-07T22:29:39Z

A realm as a whole is gc-able. But yes, so long as anything retains the realm, this is not gc-able.

WebReflection · 2017-03-08T00:37:45Z

Chiming in, since I've based part of my recent work trusting that the following would be true:

(a=>a)`a${1}b${2}` === (a=>a)`a${3}b${4}`

I also think template literals are widely used as one-off so, beside those cases where these are held, it might make more sense to have them weakly referenced?

I understand that "one-off" situation could be repeated in the future indirectly through runtime loaded code, but is it really worth to keep all of them always in memory, considering Mobile devices and IoT implications?

Just speaking as a developer, also grateful Firefox eventually decided to face this issue.

Thank you all!

littledan · 2017-04-12T14:04:31Z

To summarize, this thread raises a number of significant problems:

(from @syg) Template strings may live forever due to putting them in a WeakMap
(from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects
(from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent

At this point, it sounds like it could be reasonable to return to the original "site"-based definition. There's a chance that this could lead to compat issues, but these are reduced a bit if Firefox hasn't shipped site-independent caching yet.

Spec-wise, would it be acceptable to just put the Parse Node in the Realm's [[TemplateMap]] in place of the [[Strings]] field? We could compare them with language like, "If they are the same Parse Node...".

@WebReflection What are you using that invariant for? From the above discussion, it doesn't sound like it was core to the design of template strings.

As far as Mobile/IoT, ECMAScript already forces implementations to keep the much of the source code in memory (due to Function.prototype.toString); do you expect cached template literals to be a big proportion of that? Of course it's possible to create many more template literals than functions using eval, but such a specialized case could probably reduce its memory usage with manual usage of a Map. This is just the same as if the user is creating many Arrays that all have the same contents.

@syg

The previous definition of template caching had a few issue: - (from @syg) Template strings may live forever due to putting them in a WeakMap - (from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects - (from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent This patch makes template caching key off the Parse Node where the template occurs in source, rather than the List of Strings that the template evaluates into. These semantics seem to match SpiderMonkey's implementation of templates. V8, ChakraCore and JSC, on the other hand, implement the prior semantics. Resolves tc39#840

These tests are against a specification change based on discussion in tc39/ecma262#840 The tests here passed on SpiderMonkey but failed on other implementations, which implement the current specification.

littledan · 2017-04-12T15:05:58Z

I wrote up a strawman PR to go back to a site-based definition at #890 . If anyone has advice about what's the proper way to write this sort of thing, that would be appreciated; I don't know whether it's OK to just put a Parse Node inside a List of Records (but I don't see why not) and talk about whether it's "the same one", though I don't see why not.

@syg Are these the semantics that Firefox is currently shipping? The new tests pass on my build of SpiderMonkey, but I don't know if there are some additional subtleties.

anba · 2017-04-12T15:25:16Z

If anyone has advice about what's the proper way to write this sort of thing, that would be appreciated; I don't know whether it's OK to just put a Parse Node inside a List of Records (but I don't see why not) and talk about whether it's "the same one", though I don't see why not.

ES6 rev28 used for GetTemplateObject (called GetTemplateCallSite back then):

If a call site object for the source code corresponding to templateLiteral has already been created (see step 12 below) by a previous call to this abstract operation, then
1. Return that call site object.
...
Remember an association between the source code corresponding to templateLiteral and siteObj such that siteObj can be retrieve in subsequent calls to this abstract operation.
Return siteObj.

syg · 2017-04-12T16:19:27Z

@littledan SM Nightly is supposed to have per-content caching. Which version of the repo are you building the shell from?

littledan · 2017-04-12T16:39:14Z

@syg I used a checkout from two months ago.

Jamesernator · 2017-07-10T02:36:58Z

I feel like this has some bad implications if weak references happens, not only would your own weak reference be essentially useless if another bit of code happened to have a strong reference to the same template object (as you can't even control your own object), but you can even check if anything else has a reference to that template object.

erights · 2017-07-11T00:13:57Z

Attn @dtribble

Hi @Jamesernator

not only would your own weak reference be essentially useless if another bit of code happened to have a strong reference to the same template object (as you can't even control your own object),

I don't understand this objection. In general, if you weakly point at something that is still strongly reachable from roots, it sticks around. This might seem to be the only case (and therefore a new case) where this sharing can be brought about among objects that share only code, but it is not. This sharing only happens within a realm. Objects sharing code in a realm also share intrinsics, and from the intrinsics can navigate to other objects. These non-intrinsic objects may be differentially retained or gc'ed in the same manner.

but you can even check if anything else has a reference to that template object.

This is a known side channel not specific to template objects. The proposal calls attention to this side channel and puts the makeWeakRef function (or WeakRef constructor) on the System object (or the import.meta object) for exactly this reason.

WebReflection · 2017-09-06T18:05:58Z

@littledan

What are you using that invariant for? From the above discussion, it doesn't sound like it was core to the design of template strings.

Sorry I didn't receive any notification about your question.

Both me and recently Polymer Labs too, use template literals as unique reference to understand if same static content (within the template) needs to be analyzed.

Once you are sure one static content is identical to any version of itself, parsing it, analyzing it, procesing it as DOM tree, makes no sense because the only changes you want to consider are interpolations and nothing else.

Accordingly, having the first argument of any function tag as in this scenario:

(a=>a)`a${1}b${2}` === (a=>a)`a${3}b${4}`

is essential to figure out if the static parts of the template need to be parsed or not.

AFAIK neither my project nor Polymer Labs one use a WeakMap to hold those templates ... but that's because we both use a Map, which would retain such static content forever regardless.

I wouldn't care if WeakMap could discard them, as long as a Map works and the template literal remains unique (per realm, I wouldn't care at all if it doesn't cross realm).

Hopefully I've explained the invariant.

littledan · 2017-09-06T21:56:58Z

@WebReflection With this proposed change, your caching system would still work. It would also still successfully get cache hits, much of the time. The cache misses would only come when a distinct piece of source code is loaded which happens to have the same template literal. Then, some duplicate work would be done, but only the first time for that second piece of code. Would this be acceptable for you?

@syg

The previous definition of template caching had a few issue: - (from @syg) Template strings may live forever due to putting them in a WeakMap - (from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects - (from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent This patch makes template caching key off the Parse Node where the template occurs in source, rather than the List of Strings that the template evaluates into. These semantics seem to match SpiderMonkey's implementation of templates. V8, ChakraCore and JSC, on the other hand, implement the prior semantics. Resolves tc39#840

evilpie · 2017-11-23T20:28:28Z

Is this outstanding census?

WebReflection · 2017-11-24T12:30:42Z

@littledan gosh I still don't receive notifications here ...

The cache misses would only come when a distinct piece of source code is loaded which happens to have the same template literal.

if this includes dynamic import it might be an issue. is that the case?

littledan · 2017-11-24T14:00:38Z

We got consensus on the general idea of the change in the September 2017 meeting, but some more editorial changes are needed on the spec patch before it can land.

littledan · 2017-11-24T14:01:07Z

I apologize for the delay here!

@syg

The previous definition of template caching had a few issue: - (from @syg) Template strings may live forever due to putting them in a WeakMap - (from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects - (from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent This patch makes template caching key off the Parse Node where the template occurs in source, rather than the List of Strings that the template evaluates into. These semantics seem to match SpiderMonkey's implementation of templates. V8, ChakraCore and JSC, on the other hand, implement the prior semantics. Resolves tc39#840

These tests are against a specification change based on discussion in tc39/ecma262#840 The tests here passed on SpiderMonkey but failed on other implementations, which implement the current specification.

* Test for change to cache templates by site, not contents These tests are against a specification change based on discussion in tc39/ecma262#840 The tests here passed on SpiderMonkey but failed on other implementations, which implement the current specification. * Add a test that caching is by source location, not function identity * Update existing tests to reference the spec properly

@syg

* Normative: Cache templates per site, rather than by contents The previous definition of template caching had a few issue: - (from @syg) Template strings may live forever due to putting them in a WeakMap - (from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects - (from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent This patch makes template caching key off the Parse Node where the template occurs in source, rather than the List of Strings that the template evaluates into. These semantics seem to match SpiderMonkey's implementation of templates. V8, ChakraCore and JSC, on the other hand, implement the prior semantics. Resolves #840

This works around a shortcoming in template tags noted in tc39/ecma262#840 The static tag cache now has bounded memory usage assuming constant cache entry size.

This was referenced Apr 12, 2017

Normative: Cache templates per site, rather than by contents #890

Merged

Test for change to cache templates by site, not contents tc39/test262#972

Merged

DanielRosenwasser mentioned this issue Aug 26, 2017

Tagged template literals compiled incorrectly, TemplateStringsArray not cached microsoft/TypeScript#17956

Closed

mikesamuel mentioned this issue Feb 6, 2018

Contextual string tags to prevent SQL injection mysqljs/sqlstring#29

Open

bterlson closed this as completed in #890 Feb 7, 2018

luwes mentioned this issue Dec 10, 2022

Bug: Mux player duplicate render / frozen TemplateObject array not unique per call site muxinc/elements#517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Template objects are eternal when put into WeakMap #840

Template objects are eternal when put into WeakMap #840

syg commented Mar 7, 2017

allenwb commented Mar 7, 2017

erights commented Mar 7, 2017 •

edited

Loading

ajklein commented Mar 7, 2017

erights commented Mar 7, 2017 •

edited

Loading

syg commented Mar 7, 2017

erights commented Mar 7, 2017

WebReflection commented Mar 8, 2017

littledan commented Apr 12, 2017

littledan commented Apr 12, 2017

anba commented Apr 12, 2017

syg commented Apr 12, 2017

littledan commented Apr 12, 2017

Jamesernator commented Jul 10, 2017 •

edited

Loading

erights commented Jul 11, 2017

WebReflection commented Sep 6, 2017

littledan commented Sep 6, 2017

evilpie commented Nov 23, 2017

WebReflection commented Nov 24, 2017

littledan commented Nov 24, 2017

littledan commented Nov 24, 2017

Template objects are eternal when put into WeakMap #840

Template objects are eternal when put into WeakMap #840

Comments

syg commented Mar 7, 2017

allenwb commented Mar 7, 2017

erights commented Mar 7, 2017 • edited Loading

ajklein commented Mar 7, 2017

erights commented Mar 7, 2017 • edited Loading

syg commented Mar 7, 2017

erights commented Mar 7, 2017

WebReflection commented Mar 8, 2017

littledan commented Apr 12, 2017

littledan commented Apr 12, 2017

anba commented Apr 12, 2017

syg commented Apr 12, 2017

littledan commented Apr 12, 2017

Jamesernator commented Jul 10, 2017 • edited Loading

erights commented Jul 11, 2017

WebReflection commented Sep 6, 2017

littledan commented Sep 6, 2017

evilpie commented Nov 23, 2017

WebReflection commented Nov 24, 2017

littledan commented Nov 24, 2017

littledan commented Nov 24, 2017

erights commented Mar 7, 2017 •

edited

Loading

erights commented Mar 7, 2017 •

edited

Loading

Jamesernator commented Jul 10, 2017 •

edited

Loading