Initial overview #1

rossberg · 2017-04-10T16:55:55Z

This is a brain dump capturing various previous discussions and some of my notes. Lots more to be said and discussed, obviously, but we have to start somewhere.

This is a brain dump capturing various previous discussions and some of my notes.

lukewagner

This seems like a reasonable way to seed the proposal, thanks for writing it up!

lukewagner · 2017-04-10T19:46:53Z

proposals/gc/Overview.md

+Forming unions of different types, as value types.
+Defining, allocating, and indexing structures as extensions to imported types.
+Exceptions
+Direct support for strings?


nit: missing bullets

Also, could we add: "Admits (though does not mandate) efficient Ahead-of-Time compilation". Basically the goal of: you shouldn't have to do all the backflips JS engines do to get decent performance, and the compiled code should be cachable.

Added to next section.

lukewagner · 2017-04-10T19:51:46Z

proposals/gc/Overview.md

+
+### Efficiency Considerations
+
+Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely:


I'm guessing this is a remnant from previous notes, but just in case, can we drop mention of "managed wasm"? (I'd kindof like to avoid giving "wasm with GC features" a separate name, as if it were a separate mode, format or standard.)

lukewagner · 2017-04-10T19:53:14Z

proposals/gc/Overview.md

+* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality.
+* Independent from linear memory.
+* Pay as you go.
+* Avoid generics or other complex type structure if possible.


Could we add bullet for "MVP then Iterate" (that is, we're not going to add all GC features for all GC languages at optimal perf in one shot)?

lukewagner · 2017-04-10T20:16:56Z

proposals/gc/Overview.md

+```
+(func $D.g (param $Cthis (ref $C))
+  (local $this (ref $D))
+  (set_local $clos (call $outer (f64.const 1)))


Was this $clos line intended?

Oops, no, removed.

lukewagner · 2017-04-10T20:40:18Z

proposals/gc/Overview.md

+
+### Foreign References
+
+A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.


On this subject, I also think we should have the ability to have a foreignref<T>. This type will allow calls to imported Web API functions to also mention foreignref<T> in their signature and the result will be really fast, cast-free calls from wasm into Web APIs. The "T" would just be, I think, just an arbitrary byte string and dynamic casts would be required to go between any foreign to foreign where T != U, and whether the cast succeeds is up to the embedder.

Interesting idea, added with a slight difference: introduce definitions of "foreign types" to the type section, and simply allow forming regular references to them.

bnjbvr · 2017-04-11T10:55:37Z

proposals/gc/Overview.md

+```
+Store operators are only valid when targeting a mutable field or element.
+
+Immutability is needed to enable the safe and efficient [subtyping](#subtyping), especially as needed for the [objects](#objects-and-mehtod-tables) use case.


nit: mehtod => method in the link

bnjbvr · 2017-04-11T10:57:52Z

proposals/gc/Overview.md

+
+A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.
+
+There are no operations to manipulate foreign references, but by passing them as parameters or results of exorted Wasm functions, embedder references (such as DOM objects) can safely be stored in or round-trip through Wasm code.


exorted => exported

bnjbvr · 2017-04-11T11:05:12Z

proposals/gc/Overview.md

+  (call_ref (i32.const 5) (get_local $x))
+)
+```
+Unlike `call_indirect`, this instruction is statically typed and does not involve any runtime check.


Could these function references also be put in Tables (which type would be the function's type), for call_indirect with no runtime checks? (the current call_indirect being a regression from asm.js which only had strictly-typed tables)

Tables already have an element type, anticipating other types of tables, e.g., we could easily support tables whose element type is a specific function type. However, that is orthogonal to anything described here.

Perhaps we could have a new section at the end that just says "these types could be added as element types of tables, along with get_elem/set_elem ops"?

Wouldn't homogeneous tables simply have a plain function type as element type?

I can see that with function references we might want to add instructions to access tables directly, but that in turn seems independent of homogeneous tables.

Trying to unravel the dependencies. It seems that homogeneous function tables are unrelated to GC types. The only feature through which tables (both homegeneous and heterogeneous) might interact with this proposal would be the instructions you mention.

rossberg · 2017-04-11T10:40:09Z

proposals/gc/Overview.md

+* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality.
+* Independent from linear memory.
+* Pay as you go.
+* Avoid generics or other complex type structure if possible.


rossberg · 2017-04-11T10:43:34Z

proposals/gc/Overview.md

+Forming unions of different types, as value types.
+Defining, allocating, and indexing structures as extensions to imported types.
+Exceptions
+Direct support for strings?


Added to next section.

rossberg · 2017-04-11T10:44:11Z

proposals/gc/Overview.md

+
+### Efficiency Considerations
+
+Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely:


rossberg · 2017-04-11T10:45:46Z

proposals/gc/Overview.md

+```
+(func $D.g (param $Cthis (ref $C))
+  (local $this (ref $D))
+  (set_local $clos (call $outer (f64.const 1)))


Oops, no, removed.

rossberg · 2017-04-11T10:53:43Z

proposals/gc/Overview.md

+
+### Foreign References
+
+A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.


Interesting idea, added with a slight difference: introduce definitions of "foreign types" to the type section, and simply allow forming regular references to them.

rossberg · 2017-04-11T10:57:40Z

proposals/gc/Overview.md

+```
+Being reference types, tagged integers can be casted into `anyref`, and can participate in runtime type dispatch with `cast_down`.
+
+TODO: To avoid portability hazards, the value range of `intref` has to be restricted to at most 31 bit?


Added another design alternative for tagged integers.

rossberg · 2017-04-11T11:05:08Z

proposals/gc/Overview.md

+)
+```
+Structures are garbage-collected.
+


Added that structures can be compared for identity.

bnjbvr · 2017-04-11T11:22:03Z

proposals/gc/Overview.md

+(cast_down <type1> <type2> $label (...))
+```
+also casts the operand of type `<type1>` to type `<type2>`.
+It is a validation error if the operand's type is not `<type1>`, or if `<type1>` is not a subtype of `<type2>`.


Shouldn't type1 be a supertype of type2? (looks like a copy-pasto)

rossberg · 2017-04-11T11:25:43Z

Shouldn't type1 be a supertype of type2? (looks like a copy-pasto)

Yup, fixed.

bnjbvr · 2017-04-11T11:36:58Z

proposals/gc/Overview.md

+```
+All accesses are type-checked at validation time.
+
+Structures are [allocated](#allocation) with `new` instructions that take initialization values for each field.


(nit: #allocation doesn't seem to link to a section as of now)

Should we mention (somewhere, probably in the future #allocation paragraph) that structs/arrays might not get allocated/garbage-collected, if the implementation can prove it doesn't to be allocated (through escape analysis and scalar replacement)? Maybe too much of an implementation detail at the moment. The current phrasing just made me wonder if it is a strong requirement that structures must get allocated.

Removed stray xref. Yeah, probably not worth going into details about possible optimisations at this level. As usual, implementations can do whatever is semantically equivalent. Additional language should only be necessary if we explicitly want to forbid certain optimisations, e.g., like we did with TCO.

smvv · 2017-04-11T12:59:03Z

proposals/gc/Overview.md

+
+Packed fields require special load/store instructions:
+```
+(load_field_packed_s $s $a (...))


Two questions:

Can't the signedness of the field be inferred from the type descriptor?

Can't packed be inferred from the type descriptor as well?

Wasm does not distinguish signed from unsigned integer types, so it is necessary here to specify sign extension behaviour, same as with existing instructions such as memory loads and stores. That also implies that at least the packed loads are different opcodes than non-packed loads. For symmetry and clarity it seems preferable to do the same for stores then.

Can i8 and i16 not be used on non-packed structs, and if so, why is that?

Regarding signedness, let's say we export a type to the JavaScript world. If we export i8, i16, and i32 fields without any signedness information, it is impossible to determine to sign-ext or zero-ext when loading a value from that field in JS. TypedObject have u8, u16, u32 for this reason: https://github.com/nikomatsakis/typed-objects-explainer/blob/master/core.md#primitive-type-definitions How should we expose this signedness information to the JS world?

mtrofin · 2017-04-11T13:48:21Z

proposals/gc/Overview.md

+
+* Only basic but general structure: tuples (structs) and arrays.
+* No heavyweight object model.
+* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality.


...but that's a cost paid only if you opt-in to GC, i.e. vanilla wasm code isn't affected, correct?

Right. Added as comment to the pay-as-you-go bullet

mtrofin · 2017-04-11T13:49:41Z

proposals/gc/Overview.md

+
+### Requirements
+
+* Allocation of structures on the heap which are garbage collected.


which heap? JS heap, or the wasm memory?

Conceptually, the (newly introduced) Wasm heap -- which, in a JS embedding, will happen to be shared with JS.

rossberg · 2017-04-11T13:53:02Z

Can i8 and i16 be used on non-packed structs, and if so, why is that?

There are no packed structs, only "packed" (sub-word-sized) fields. These types can only be used for fields, they are not general value types.

Regarding signedness, let's say we export a type to the JavaScript world. If we export i8, i16, and i32 fields without any signedness information, it is impossible to determine to sign-ext or zero-ext when loading a value from that field in JS. TypedObject have u8, u16, u32 for this reason: https://github.com/nikomatsakis/typed-objects- explainer/blob/master/core.md#primitive-type-definitions How should we expose this signedness information to the JS world?

You are right, that can be a hassle. However, this problem isn't new, it already exists with plain Wasm today: whenever you import/export a function with an integer param/result from or to JavaScript, it is the responsibility of the JS side to interpret (and potentially, convert) their signedness correctly. At some point we discussed adding some form of annotation to ease that. In general, JS interop and API is a somewhat separate problem layer that probably needs its own subdiscussion.

mtrofin · 2017-04-11T13:53:13Z

proposals/gc/Overview.md

+
+* all operations are very cheap, ideally constant time,
+* structures are contiguous, dense chunks of memory,
+* accessing fields are single-indirection loads and stores,


Won't tuple-typed fields need more than that (atomic access to avoid tearing)

There is no access to tuple-typed fields. With nesting, you can get an interior reference to such a field, but you can only load/store its fields individually.

But I added a bullet to requirements about interaction with threads.

mtrofin · 2017-04-11T13:57:03Z

proposals/gc/Overview.md

@@ -1 +1,716 @@
+# GC Extension


Is it about "GC"-ed languages or is it "JIT" rather?

Not sure I understand the question. Why JIT?

Not sure myself anymore either. Somehow I probably misread something, and started thinking "JIT". Sorry.

bashor · 2017-04-11T18:31:03Z

proposals/gc/Overview.md

+
+Should attempt to implement 2-3 exemplary languages:
+
+* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces),


Could we help you by implementing it for Kotlin?

Sorry to hijack the review, but @bashor if you do get around to toying with a Kotlin-to-WASM compiler, I have some code at https://github.com/cretz/asmble/tree/master/src/main/kotlin/asmble/io that can save you a lot of work wrt parsing and what not (still in early stages and binary not tested well).

I also have plans to compile Scala.js to wasm-with-GC (background info: I'm the author of Scala.js), as soon as some prototype interpreter is available. From there, it shouldn't take more than a 3-4 weeks, since we can reuse our entire pipeline except the very last piece (the so-called "emitter"), which for the compile-to-JS version is less than 4000 LoCs.

Any implementation experiments will help evaluating the design, and any feedback from such experiments will be highly appreciated. The more the better! That said, we probably want to make sure that the set of experimental compilers includes some sufficiently "mainstream" languages for each category (for some definition of "mainstream").

evancz · 2017-04-13T22:34:16Z

proposals/gc/Overview.md

+Should attempt to implement 2-3 exemplary languages:
+
+* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces),
+* a typed functional language (e.g., a subset of ML, with closures, polymorphism, variant types)


I'm the designer and lead developer of Elm, a simple ML-family language with all these features. Our primary compilation target is JS right now, so DOM interaction is a very practical consideration for us. Our community is very interested in WebAssembly, and I would like to help with this.

jfbastien · 2017-04-13T23:14:14Z

proposals/gc/Overview.md

+* Defining, allocating, and indexing structures as extensions to imported types? (future extension)
+* Exceptions (separate proposal)
+* Direct support for strings? (separate proposal)
+* Safe interaction with threads (sharing, atomic access)


Right, it would also be nice to clarify no-tearing versus ordering guarantees.

Agreed, but first things first. For starters, thread interaction is just a big TODO, see e.g. the section on Sharing. I anticipate more details being filled out eventually.

I was replying to a comment which now seems deleted.

konsoletyper · 2017-04-14T19:24:42Z

proposals/gc/Overview.md

+(type $g-sig (func (param (ref $C)) (result i32)))
+(type $h-sig (func (param (ref $D)) (result i32)))
+
+(type $C (struct (ref $C-vt) (mut i32))


In my current implementation of JVM upon linear memory I store reference to class in first 4 bytes. Besides vtable, class contains information about object's size. GC uses this information to get object's size. How GC is intended to know size of a tuple? I guess, physically every tuple will have 4 or 8 bytes header with tuple type. And then goes reference to vtable, which is additionally 4 or 8 bytes. Would it be useful to support special kind of first field of a tuple to implement similar behaviour?

Class is not just a vtable. In my current implementation I found following fields: flags (initalized, primitive), identifier, name, class of element (for array classes), arrayof class, reference to supertype function, superclass, reference to enum fields. Do you suggest to provide this information via special function which is present in every vtable?

Ah, I just realized that this "vtable" structure is not necessarily a structure of function pointers. It can contain all these fields as-is.

konsoletyper · 2017-04-14T19:33:28Z

proposals/gc/Overview.md

+(same (new $point ...) (new $point ...))  ;; false
+```
+TODO: Could even allow heterogeneous equality (equality between operands of different type), but that might lead to some discontinueties or even prevent some potential optimizations?
+


What about shallow copy of a structure (which is usually implemented by memcpy)?

Right. Not currently included, but may be a worthwhile extension if there turn out to be sufficiently many use cases of large structures that need to be copied by value.

konsoletyper · 2017-04-14T19:37:14Z

proposals/gc/Overview.md

+```
+
+Like structures, arrays can be compared for identity.
+


In JVM, arrays are first-class objects that extend java.lang.Object, support getClass(), hashCode, etc. In my implementations arrays are actually objects that have corresponding header, which points to class with vtable. How I can implement this with Wasm arrays? The only approach I can see is to generate additional wrapper class (two structures, one for data and one for vtable).

With the base proposal you indeed need to allocate two values, the "object" and the array backing store. With the extension for nesting and "flexible types" discussed below you could embed the latter into the former.

konsoletyper · 2017-04-14T19:39:37Z

proposals/gc/Overview.md

+That would simplify creation of immutable objects, by first creating them as mutable, initialize them, and then cast away their constness.
+On the other hand, it means that immutable fields can still change, preventing various access optimizations.
+(Another alternative would be a three-state mutability algebra.)
+


So, types are organized in a tree, which is insufficient for Java, where types form DAG. How are interfaces supposed to be implemented?

No, structural subtyping can form arbitrary DAGs. Also, note that these are low-level representation types. They don't necessarily bear a direct relation to the source language type system.

rossberg · 2017-04-19T10:52:20Z

In my current implementation of JVM upon linear memory I store reference to class in first 4 bytes. Besides vtable, class contains information about object's size. GC uses this information to get object's size. How GC is intended to know size of a tuple? I guess, physically every tuple will have 4 or 8 bytes header with tuple type. And *then* goes reference to vtable, which is additionally 4 or 8 bytes. Would it be useful to support special kind of first field of a tuple to implement similar behaviour?

@konsoletyper, yes, this will most likely result in taking up two words for each Java object. Coming up with a feature for Wasm that would make this more space efficient while still being portable across all engines and hardware platforms is a serious challenge. If anybody knows a satisfactory solution to that problem then I'd be very interested in seeing it.

rossberg · 2017-04-26T07:49:48Z

Is anybody opposed to landing this initial PR?

rossberg added 2 commits April 10, 2017 18:54

Initial overview

f3f25d9

This is a brain dump capturing various previous discussions and some of my notes.

Add evaluation note

3866e9b

lukewagner mentioned this pull request Apr 10, 2017

Add Typed Object support to WebAssembly WebAssembly/design#1022

Closed

2 tasks

lukewagner reviewed Apr 10, 2017

View reviewed changes

bnjbvr reviewed Apr 11, 2017

View reviewed changes

Comments; minor additions

829ea65

rossberg commented Apr 11, 2017

View reviewed changes

bnjbvr reviewed Apr 11, 2017

View reviewed changes

Fix typo

0476b31

bnjbvr reviewed Apr 11, 2017

View reviewed changes

Remove dangling xref

d48dbfb

smvv reviewed Apr 11, 2017

View reviewed changes

mtrofin reviewed Apr 11, 2017

View reviewed changes

More comments

88a317e

bashor reviewed Apr 11, 2017

View reviewed changes

rossberg added 2 commits April 12, 2017 14:42

Add some on imports/exports

9101911

No dependencies

ea1f1a2

evancz reviewed Apr 13, 2017

View reviewed changes

jfbastien reviewed Apr 13, 2017

View reviewed changes

konsoletyper reviewed Apr 14, 2017

View reviewed changes

rossberg merged commit 8515619 into master Apr 27, 2017

rossberg deleted the gc-overview branch April 27, 2017 09:29

aardappel mentioned this pull request Jun 29, 2020

Alternatives to i31ref wrt compiling parametric polymorphism on uniformly-represented values (OCaml) #100

Closed

fgmccabe mentioned this pull request Sep 17, 2020

Requirements #121

Closed


		### Efficiency Considerations

		Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely:


		### Foreign References

		A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.


		A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.

		There are no operations to manipulate foreign references, but by passing them as parameters or results of exorted Wasm functions, embedder references (such as DOM objects) can safely be stored in or round-trip through Wasm code.


		### Requirements

		* Allocation of structures on the heap which are garbage collected.


		Should attempt to implement 2-3 exemplary languages:

		* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces),

Initial overview #1

Initial overview #1

Conversation

rossberg commented Apr 10, 2017

lukewagner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg commented Apr 11, 2017 via email

Choose a reason for hiding this comment

rossberg Apr 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smvv Apr 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg commented Apr 11, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg Apr 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg commented Apr 19, 2017 via email

rossberg commented Apr 26, 2017

rossberg Apr 11, 2017 •

edited

Loading

smvv Apr 11, 2017 •

edited

Loading

rossberg Apr 12, 2017 •

edited

Loading