Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template objects are eternal when put into WeakMap #840

Closed
syg opened this issue Mar 7, 2017 · 20 comments · Fixed by #890
Closed

Template objects are eternal when put into WeakMap #840

syg opened this issue Mar 7, 2017 · 20 comments · Fixed by #890

Comments

@syg
Copy link
Contributor

syg commented Mar 7, 2017

As with all fun things this came out of a discussion with @anba and @arai-a. We ran into it as I was fixing template object canonicalization in Firefox.

So GetTemplateObject canonicalizes template objects using a realm-wide template registry. I assume the intention is to allow this registry to be implemented internally as a weak map: if the template object itself is GCed, then the identity of the canonical object becomes unobservable, and thus can be removed from the template registry.

Funnily enough, using the template object as a JS WeakMap key makes it observable forever. Since from the spec's point of view the template objects are eternal anyways, the spec calls for the following behavior (test case courtesy of @arai-a):

let map = new WeakMap();
{
  let obj = (x => x)`foo`; // new template object is created
  map.set(obj, 10);
}
gc(); // template object will be collected if it's weak
{
  let obj = (x => x)`foo`; // new template object is created
  assertEq(map.get(obj), 10); // fail
}

In other words, putting a template object as a key in a WeakMap actually makes it live forever and unable to be GCed.

@ajklein @bterlson @msaboff How do your respective engines implement this?

@allenwb
Copy link
Member

allenwb commented Mar 7, 2017

template objects are canonicalize so that template handlers can use them as cache keys (possibly weak) to memoize any preprocessing of the template object.

I assume the intention is to allow this registry to be implemented internally as a weak map: if the template object itself is GCed, then the identity of the canonical object becomes unobservable, and thus can be removed from the template registry.

Not necessarily. The original idea was that template objects corresponded to a specific "site" in the source code and that any evaluation (for the duration of its realm) of the template literal at that site would always produce the same template objct. That's why the metavariables that holds the result of calls to GetTemplateObject are named siteObj. GetTemplateObject and the registry it uses were just a specification mechanism for saying that each evaluation of template literal at particular source code site must yield the same template site object.

At some point we (presumably @erights and me, but I don't recall) must have realized that because the "site" template objects were immutable, the same object could be used by different template literal sites within the same realm (same realm because they reference %ArrayPrototype%) and so the registry doesn't include the parse node ("the site") as part of the key. Unfortunately, that in combination with the ESs ability to dynamically load code implies that once a template object is created in a realm it must exist for the remaining lifetime of the realm. I'm not sure whether that was in fact the original intent when we were thinking of template objects as being site specific.

@erights
Copy link

erights commented Mar 7, 2017

Yes. Given evaluators, there were ambiguities regarding the definition of "site". Rather than resolve these, we decided that all identical template object contents within the same realm be made identical through a realm-specific registry. Had the identity been specific to "site", the identity needs to last as long as the possibility that the "same" "site" be able to be mentioned again, for whatever the concept of "site" is. This emphasizes why the "site" concept is tricky to get right.

The realm is a good replacement for site. Since the template object inherits from some Array.prototype intrinsic, clearly nothing coarser-grained than realm is possible.

I'm sorry if the issue of what-is-weak got confused in the process. I may indeed have been confused about this during the transition. Here's how I see it now:

The per-realm registry must indeed hold onto template objects strongly. It is a strong interning table. The important weak maps we were always trying to enable, and still are, are the weak maps that the template tags can choose to use in order to memoize any preprocessing work it does on a particular template object, so it can amortize that work over all the substitution values that template object is reused with.

My own https://github.com/erights/quasiParserGenerator is a good example: The bnf is part of the literal template text. The action rules are represented by first class functions as substitution values. From one evaluation of a bnf`...` expression to the next, the literal bnf grammar is always the same but the action rules can differ. All the work generating a parser from the bnf text is reused even though the resulting parser may be used with different action rules.

For present purposes, a strong per-realm interning registry does enable working weakmaps per tag for memoizing any per-template preprocessing, for amortizing effort per template expression evaluation of the "same" expression, for a workable definition of "same".

However, the quasi-parser generator also demonstrates that per-realm interning is too coarse for a completely different reason I'll explain in another post. That reason does not affect the conclusion that the per-realm registry should be strong.

@ajklein
Copy link
Contributor

ajklein commented Mar 7, 2017

@caitp wrote the v8 implementation; a cursory examination suggests that we hold onto the objects strongly (we use a JS Map as the storage).

@erights
Copy link

erights commented Mar 7, 2017

Why is a per-realm interning based on only the currently specified contents of a template object too coarse?

Image that a template tag function is a quasi-parser for an interesting dsl. The https://github.com/erights/quasiParserGenerator is a fine example, where the dsl is bnf[1]. Now, let's say that a given bnf`...` expression itself has a syntax error -- it is not a syntactically valid bnf specification. The bnf tag must certainly reject it, and do so with an error message for helping the programmer to find and fix the syntax error. Of all the parts of a syntax-error-diagnostic, by far the most useful part is the source location.

Given the current template literal specification, it is IMPOSSIBLE for a template tag function to figure out what source location to tell the user about.

Instead, we need, as another immutable part of the template object, a source map mapping from locations in the template back to source locations. But what is a source location? Both the de-facto source map draft standard and the error stacks proposal traffics in source locations, but neither tries to pin down what they are. Whatever they are, this means that the per-registry strong interning table should take these into account as part of the interning. This will make finer grain distinctions among templates in the same realm than can be made within the current spec.

Attn @disnet

[1] Actually, parsing expression grammars rather than bnf. The difference is irrelevant to the present point.

@syg
Copy link
Contributor Author

syg commented Mar 7, 2017

I'm not particularly please about speccing something that in practice un-GC-able. Nevertheless, thanks for verifying that the registry must be strong.

@erights
Copy link

erights commented Mar 7, 2017

A realm as a whole is gc-able. But yes, so long as anything retains the realm, this is not gc-able.

@WebReflection
Copy link

Chiming in, since I've based part of my recent work trusting that the following would be true:

(a=>a)`a${1}b${2}` === (a=>a)`a${3}b${4}`

I also think template literals are widely used as one-off so, beside those cases where these are held, it might make more sense to have them weakly referenced?

I understand that "one-off" situation could be repeated in the future indirectly through runtime loaded code, but is it really worth to keep all of them always in memory, considering Mobile devices and IoT implications?

Just speaking as a developer, also grateful Firefox eventually decided to face this issue.

Thank you all!

@littledan
Copy link
Member

To summarize, this thread raises a number of significant problems:

  • (from @syg) Template strings may live forever due to putting them in a WeakMap
  • (from @ajklein) Because of this logic, it's rather difficult to implement any GC at all of template objects
  • (from @erights) The template string facility cannot be extended to expose anything about the site, as it's site-independent

At this point, it sounds like it could be reasonable to return to the original "site"-based definition. There's a chance that this could lead to compat issues, but these are reduced a bit if Firefox hasn't shipped site-independent caching yet.

Spec-wise, would it be acceptable to just put the Parse Node in the Realm's [[TemplateMap]] in place of the [[Strings]] field? We could compare them with language like, "If they are the same Parse Node...".

@WebReflection What are you using that invariant for? From the above discussion, it doesn't sound like it was core to the design of template strings.

As far as Mobile/IoT, ECMAScript already forces implementations to keep the much of the source code in memory (due to Function.prototype.toString); do you expect cached template literals to be a big proportion of that? Of course it's possible to create many more template literals than functions using eval, but such a specialized case could probably reduce its memory usage with manual usage of a Map. This is just the same as if the user is creating many Arrays that all have the same contents.

littledan added a commit to littledan/ecma262 that referenced this issue Apr 12, 2017
The previous definition of template caching had a few issue:
 - (from @syg) Template strings may live forever due to putting them
   in a WeakMap
 - (from @ajklein) Because of this logic, it's rather difficult to
   implement any GC at all of template objects
 - (from @erights) The template string facility cannot be extended
   to expose anything about the site, as it's site-independent

This patch makes template caching key off the Parse Node where the
template occurs in source, rather than the List of Strings that the
template evaluates into.

These semantics seem to match SpiderMonkey's implementation of templates.
V8, ChakraCore and JSC, on the other hand, implement the prior semantics.

Resolves tc39#840
littledan added a commit to littledan/test262 that referenced this issue Apr 12, 2017
These tests are against a specification change based on discussion in
tc39/ecma262#840

The tests here passed on SpiderMonkey but failed on other
implementations, which implement the current specification.
@littledan
Copy link
Member

I wrote up a strawman PR to go back to a site-based definition at #890 . If anyone has advice about what's the proper way to write this sort of thing, that would be appreciated; I don't know whether it's OK to just put a Parse Node inside a List of Records (but I don't see why not) and talk about whether it's "the same one", though I don't see why not.

@syg Are these the semantics that Firefox is currently shipping? The new tests pass on my build of SpiderMonkey, but I don't know if there are some additional subtleties.

@anba
Copy link
Contributor

anba commented Apr 12, 2017

If anyone has advice about what's the proper way to write this sort of thing, that would be appreciated; I don't know whether it's OK to just put a Parse Node inside a List of Records (but I don't see why not) and talk about whether it's "the same one", though I don't see why not.

ES6 rev28 used for GetTemplateObject (called GetTemplateCallSite back then):

  1. If a call site object for the source code corresponding to templateLiteral has already been created (see step 12 below) by a previous call to this abstract operation, then
    1. Return that call site object.
  2. ...
  3. Remember an association between the source code corresponding to templateLiteral and siteObj such that siteObj can be retrieve in subsequent calls to this abstract operation.
  4. Return siteObj.

@syg
Copy link
Contributor Author

syg commented Apr 12, 2017

@littledan SM Nightly is supposed to have per-content caching. Which version of the repo are you building the shell from?

@littledan
Copy link
Member

@syg I used a checkout from two months ago.

@Jamesernator
Copy link

Jamesernator commented Jul 10, 2017

I feel like this has some bad implications if weak references happens, not only would your own weak reference be essentially useless if another bit of code happened to have a strong reference to the same template object (as you can't even control your own object), but you can even check if anything else has a reference to that template object.

@erights
Copy link

erights commented Jul 11, 2017

Attn @dtribble

Hi @Jamesernator

not only would your own weak reference be essentially useless if another bit of code happened to have a strong reference to the same template object (as you can't even control your own object),

I don't understand this objection. In general, if you weakly point at something that is still strongly reachable from roots, it sticks around. This might seem to be the only case (and therefore a new case) where this sharing can be brought about among objects that share only code, but it is not. This sharing only happens within a realm. Objects sharing code in a realm also share intrinsics, and from the intrinsics can navigate to other objects. These non-intrinsic objects may be differentially retained or gc'ed in the same manner.

but you can even check if anything else has a reference to that template object.

This is a known side channel not specific to template objects. The proposal calls attention to this side channel and puts the makeWeakRef function (or WeakRef constructor) on the System object (or the import.meta object) for exactly this reason.

@WebReflection
Copy link

@littledan

What are you using that invariant for? From the above discussion, it doesn't sound like it was core to the design of template strings.

Sorry I didn't receive any notification about your question.

Both me and recently Polymer Labs too, use template literals as unique reference to understand if same static content (within the template) needs to be analyzed.

Once you are sure one static content is identical to any version of itself, parsing it, analyzing it, procesing it as DOM tree, makes no sense because the only changes you want to consider are interpolations and nothing else.

Accordingly, having the first argument of any function tag as in this scenario:

(a=>a)`a${1}b${2}` === (a=>a)`a${3}b${4}`

is essential to figure out if the static parts of the template need to be parsed or not.

AFAIK neither my project nor Polymer Labs one use a WeakMap to hold those templates ... but that's because we both use a Map, which would retain such static content forever regardless.

I wouldn't care if WeakMap could discard them, as long as a Map works and the template literal remains unique (per realm, I wouldn't care at all if it doesn't cross realm).

Hopefully I've explained the invariant.

@littledan
Copy link
Member

@WebReflection With this proposed change, your caching system would still work. It would also still successfully get cache hits, much of the time. The cache misses would only come when a distinct piece of source code is loaded which happens to have the same template literal. Then, some duplicate work would be done, but only the first time for that second piece of code. Would this be acceptable for you?

littledan added a commit to littledan/ecma262 that referenced this issue Oct 12, 2017
The previous definition of template caching had a few issue:
 - (from @syg) Template strings may live forever due to putting them
   in a WeakMap
 - (from @ajklein) Because of this logic, it's rather difficult to
   implement any GC at all of template objects
 - (from @erights) The template string facility cannot be extended
   to expose anything about the site, as it's site-independent

This patch makes template caching key off the Parse Node where the
template occurs in source, rather than the List of Strings that the
template evaluates into.

These semantics seem to match SpiderMonkey's implementation of templates.
V8, ChakraCore and JSC, on the other hand, implement the prior semantics.

Resolves tc39#840
@evilpie
Copy link
Contributor

evilpie commented Nov 23, 2017

Is this outstanding census?

@WebReflection
Copy link

@littledan gosh I still don't receive notifications here ...

The cache misses would only come when a distinct piece of source code is loaded which happens to have the same template literal.

if this includes dynamic import it might be an issue. is that the case?

@littledan
Copy link
Member

We got consensus on the general idea of the change in the September 2017 meeting, but some more editorial changes are needed on the spec patch before it can land.

@littledan
Copy link
Member

I apologize for the delay here!

littledan added a commit to littledan/ecma262 that referenced this issue Feb 3, 2018
The previous definition of template caching had a few issue:
 - (from @syg) Template strings may live forever due to putting them
   in a WeakMap
 - (from @ajklein) Because of this logic, it's rather difficult to
   implement any GC at all of template objects
 - (from @erights) The template string facility cannot be extended
   to expose anything about the site, as it's site-independent

This patch makes template caching key off the Parse Node where the
template occurs in source, rather than the List of Strings that the
template evaluates into.

These semantics seem to match SpiderMonkey's implementation of templates.
V8, ChakraCore and JSC, on the other hand, implement the prior semantics.

Resolves tc39#840
littledan added a commit to littledan/test262 that referenced this issue Feb 5, 2018
These tests are against a specification change based on discussion in
tc39/ecma262#840

The tests here passed on SpiderMonkey but failed on other
implementations, which implement the current specification.
leobalter pushed a commit to tc39/test262 that referenced this issue Feb 5, 2018
* Test for change to cache templates by site, not contents

These tests are against a specification change based on discussion in
tc39/ecma262#840

The tests here passed on SpiderMonkey but failed on other
implementations, which implement the current specification.

* Add a test that caching is by source location, not function identity

* Update existing tests to reference the spec properly
bterlson pushed a commit that referenced this issue Feb 7, 2018
* Normative: Cache templates per site, rather than by contents

The previous definition of template caching had a few issue:
 - (from @syg) Template strings may live forever due to putting them
   in a WeakMap
 - (from @ajklein) Because of this logic, it's rather difficult to
   implement any GC at all of template objects
 - (from @erights) The template string facility cannot be extended
   to expose anything about the site, as it's site-independent

This patch makes template caching key off the Parse Node where the
template occurs in source, rather than the List of Strings that the
template evaluates into.

These semantics seem to match SpiderMonkey's implementation of templates.
V8, ChakraCore and JSC, on the other hand, implement the prior semantics.

Resolves #840
mikesamuel added a commit to mikesamuel/template-tag-common that referenced this issue Feb 16, 2018
This works around a shortcoming in template tags noted in
tc39/ecma262#840

The static tag cache now has bounded memory usage
assuming constant cache entry size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants