Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First go at MVP proposal #34

Merged
merged 14 commits into from
Aug 23, 2018
272 changes: 272 additions & 0 deletions proposals/gc/MVP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
# GC v1 Extensions


## Language

Based on reference types proposal.

### Types

#### Value Types

* `eqref` is a new reference type
- `reftype ::= ... | eqref`

* `ref <typeidx>` is a new reference type
- `reftype ::= ... | ref <typeidx>`
- `ref $t ok` iff `$t` is defined in the context

* `int31ref` is a new reference type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I am sure this has its uses, and is nice to have from a "why not" perspective, I feel this is such an implementation specific optimization (it will not apply to languages that guarantee 32-bits integers, and it may not be a great fit for certain engines/hosts either, if they e.g. need more than 1 bit) that I'd think we'd be better off not having it, or guarantee the full 32-bits.

Copy link
Member Author

@rossberg rossberg May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @stedolan said. Don't think of this as a source-level type, it's just a way to represent tagged pointers/integers, which is a mechanism that many languages need. 32 bit integer references can be implemented in user space using this primitive. See also my recent comment on the respective issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what comment of @stedolan you are referring to, I don't see him refer to int31ref at all?

I see "many languages" needing 32-bit, and 31-bit being rather specific to some language implementations. I don't see how wasm as a whole benefits from this bias, especially since there will be some cost to supporting it (there will be places where this bit needs to be tested if integer values are a possibility). The alternative, if int32ref is not practical, is to not have integers part of reftype at all, since at least that would give engines the benefit of being able to assume a reftype is always a pointer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this comment.

To clarify, int31ref is primarily a primitive for making pointer tagging available, in the most general form that we can still guarantee to be directly implementable across all relevant platforms and engines. Such tagging is a mechanism that probably a majority of GC'ed language implementations rely on in some form (almost all dynamic languages, functional languages, logic languages). The GC proposal will effectively be useless to them without it. While certainly limited, the proposed type hopefully is general enough that a rather large subset of these use cases can be mapped to it directly.

Int32ref is in a different category. It is not applicable to the same use cases, since it can involve unpredictable and substantial hidden cost on some engines (allocation and branching in particular). For example, in V8 on 32 bit platforms ;). Also, an int32ref can already be implemented with the current proposal, in at least two ways: either (1) as a ref to an immutable struct with a single int32 field (which an engine should be able to optimise just like a proper int32ref), or (2) as the union of an int31ref and the aforementioned struct, doing the necessary boxing/unboxing of large values in user space (in case you don't trust the engine to do a good enough job).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cost to languages that don't use it? The mere possibility of int31ref values being present may require engines to emit code that test the bit, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only at the point where they perform a downcast from anyref, which is an explicit operation and which in practice will require that check anyway, because existing engines tend to use tagging already (like V8).

- `reftype ::= ... | int31ref`


#### Type Definitions

* `deftype` is the new category of types that can occur as type definitions
- `deftype ::= <functype> | <structtype> | <arraytype>`
- `module ::= {..., types vec(<deftype>)}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a types section containing functypes which I think we're extending with other constructor "forms", right? (The wording sounds like this is a new section, so just checking.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deftype is the syntax of type definitions, which previously was just function types. (I'm avoiding saying "type section" here since in my mind, sections are rather an encoding detail of the binary format.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense. Perhaps tweak wording to say "generalizes the existing types component of modules"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


* `structtype` describes a structure with statically indexed fields
- `structtype ::= struct <fieldtype>*`

* `arraytype` describes an array with dynamically indexed fields
- `arraytype ::= array <fieldtype>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the older idea of giving structtypes a trailing, dynamic-length array tail? I'd be surprised if multiple source-languages didn't need to store one or two extra bookkeeping fields to implement their source-language array, so it feels like a GC MVP thing and it seems like, with that ability, a separate arraytype isn't even needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this would really extend language's abilities to find efficient representations for their data types. Engines can still treat structs without a tail specially.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is enabled by allowing nesting of aggregates (structs/arrays), which is currently left for post-MVP. For MVP it seems okay to use an indirection.

Copy link
Member

@lukewagner lukewagner May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough


* `fieldtype` describes a struct or array field and whether it is mutable
- `fieldtype ::= <mutability> <storagetype>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing definition of mutability ?
A field can also be a reftype ?
A field can be reftype that is stored in-line? Or is that GC v2? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutability is in the spec already. A field can be a valtype, which includes reftypes per the reference types proposal. Nested aggregates are post-MVP, since they imply introducing inner pointers.

- `storagetype ::= <valtype> | <packedtype>`
- `packedtype ::= i8 | i16`

* Unpacking a storage type yields `i32` for packed types, otherwise the type itself
- `unpacked(t) = t`
- `unpacked(pt) = i32`


#### Imports

* `type <typetype>` is an import description with an upper bound
- `importdesc ::= ... | type <reftype>`
- Note: `type` may get additional parameters in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the reasoning behind having the type have a reftype as opposed to a deftype? E.g., into which index space does a type import belong? It seems like it should be in the type definition index space (so it can be used in function types, exports, etc). I was imagining that maybe, as a special case (to play well with section ordering), type imports would go into the type section (using a special import form followed by the type constructor's form). Then a type import would be prepended to both the type definition index space and the import index space. Kinda weird, but seems forced by the type/import circularity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bound is a reftype is that you need to be able to use anyref (or eqref) as a bound.

Yes, type imports are in the type index space. Lumping them in with the type section rather than with imports would be inconsistent with how we handle other imports. It would create a mess with both the type index space (which should have imports first) and the "argument list" of a module instantiation (which currently simply reflects the import list). If we want to separate type imports from other types than I think we instead should have a separate type import section. But I don't think it's strictly necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, still not sure reftype makes sense as the bound, but I guess I'll wait to see what you propose for nominal types + type imports. In particular, it seems like the import needs to introduce a new type definition; it can't just reference an existing one.

Ah ok, so then type imports are appended to the type section so that (1) signatures can refer to type imports by just using the appropriate index (2) the type section can only be validated together with the import section. Yes? It'd be good to mention these things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes, except that type imports are still meant to be prepended to the type index space. (The ordering of binary sections doesn't have to imply the ordering of index spaces. As you said, they need to be validated together anyway.)

Added comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the bound: In most cases, it will just be anyref, i.e., the type is fully abstract. How would you express that if it wasn't a reftype?

FWIW, I'm not sure I understood what you meant by:

In particular, it seems like the import needs to introduce a new type definition; it can't just reference an existing one.

Why not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I understand the proposal a bit better now. I also see the benefit in that, by using a reference-type bound (instead of using a struct/array bound more-directly), one has the more-expressive capability of declaring whether the import is equality-comparable, nullable, etc.



#### Exports

* `type <typeidx>` is an export description
- `exportdesc ::= ... | type <typeidx>`
- `type $t ok` iff `$t` is defined in the context


#### Subtyping

Greatest fixpoint (co-inductive interpretation) of the given rules (implying reflexivity and transitivity).

* `eqref` is a subtype of `anyref`
- `eqref <: anyref`
- Note: `int31ref` and `anyfunc` are *not* a subtypes of `eqref`, i.e., those types do not expose reference equality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anyfunc in tables is already nullable...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't disagree, but how does that relate to this item?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I put this comment on the wrong line: I meant to comment on the below nullref is not a subtype of anyfunc line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yes, typo.


* `nullref` is a subtype of `eqref`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a hint somewhere (if not here, then in the GC proposal) why we need nullref at all, since none of the current types are nullable. Is this just to be able to type locals between function start and first assignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from the reference types proposal. Anyref and anyfunc are nullable, so before you have cast down to a concrete ref type successfully it could still be null.

- `nullref <: eqref`
- Note: `nullref` is *not* a subtype of `anyfunc`, `int31ref`or any concrete reference type, i.e., those types are not nullable

* `int31ref` is a subtype of `anyref`
- `int31ref <: anyref`

* Any concrete reference type is a subtype of `anyref`
- `ref $t <: anyref`
- Note: concrete reference types are *not* a supertypes of `nullref`, i.e., nut nullable

* Any function reference type is a subtype of `anyfunc`
- `ref $t <: anyfunc`
- iff `$t = <functype>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also optref $t, yes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is implied by the previous bullet.


* Any concrete reference type is a subtype of `eqref` if its not a function
- `ref $t <: eqref`
- if `$t = <structtype>` or `$t = <arraytype>`
- or `$t = type rt` and `rt <: eqref`
- TODO: provide a way to make data types non-eq, especially immutable ones

* Concrete reference types are covariant
- `ref $t1 <: ref $t2`
- iff `$t1 <: $t2`

* Structure types support width and depth subtyping
- `struct <fieldtype1>* <fieldtype1'>* <: struct <fieldtype2>*`
- iff `(<fieldtype1> <: <fieldtype2>)*`

* Array types support depth subtyping
- `array <fieldtype1> <: array <fieldtype2>`
- iff `<fieldtype1> <: <fieldtype2>`

* Field types are covariant if they are immutable, invariant otherwise
- `const <valtype1> <: const <valtype2>`
- iff `<valtype1> <: <valtype2>`
- `var <valtype> <: var <valtype>`
- Note: mutable fields are *not* subtypes of immutable ones, so `const` really means constant, not read-only


#### Defaultability

* Any numeric value type is defaultable

* A reference value type is defaultable if it is not of the form `ref $t`

* Locals must have a type that is defaultable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without adding a nullable type constructor, it seems like any source language with ubiquitous nullable types will be forced to use tons of otherwise-unnecessary anyrefs and casts. Nullability seems like an important thing not to bolt on later too, so can we add it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be added, but there is a risk though that people will be lazy and default to using optref instead of anyref since it gives "easier" interop between all sorts of languages, which will make languages that are null-safe pay the price for having nulls. A world where non-null is default and languages that are liberal with null pay the price makes more sense to me, and will promote more robust software ecosystems built on wasm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am on the fence here. The main benefit of a nullable ref type over anyref is that a downcast to the non-nullable concrete ref type is gonna be slightly cheaper. But you'd still need it. (Or are you suggesting that all ref instructions should also work with nullable refs directly?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A downcast from a nullref $t to ref $t is going to be significantly cheaper than a downcast from anyref; the former is a single null check; the latter several instructions and memory access (several, if TLS is used to avoid baking in pointers to the code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, added optref (for a short name). (I abandoned the idea for a separate nullable type constructor because it actually introduces more anomalies than benefits.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh: perhaps also mention table element kinds (or maybe this is solved by validation constraint on table.grow and adding a new table.grow_init that takes a value to initialize with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think table.grow should take an initialisation operand right away, no point in having two separate instructions there. (I just realised that this is instruction is not currently part of any proposal, so I opened an issue for the reference types proposal.)

But there is another constraint we need: a table definition of non-zero initial size must have a defaultable element type. Added that.



### Instructions

#### Equality

* `ref.eq` compares two references whose types support equality
- `ref.eq : [eqref eqref] -> [i32]`


#### Functions

* `ref.func` creates a function reference from a function index
- `ref.func $f : [] -> (ref $t)`
- iff `$f : $t`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been on the edge of whether function references were needed for the GC MVP, but the previous discussion about passing closures to and from JS without wrapping each time seemed to boost the priority. Given that, could we also have the bind operator which can be used to manufacture closures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but closures are different from mere function references, since they have a different runtime representation. Hence they imply a new type, so I considered them post-MVP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had assumed that they would have the same type and that they would also have the same representation. That is, even without a bound dynamic environment, ref.func is still a closure because it entrains the WebAssembly.Instance. We represent these as JSFunction objects internally in SM. I had entertained the ideal of representing function references as raw function pointers, but I think this gets pretty hairy when GC and multi-instance linking enters the picture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the question section.


* `call_ref` calls a function through a reference
- `call_ref : [t1* (ref $t)] -> [t2*]`
- iff `$t = [t1*] -> [t2*]`


#### Structures

* `struct.new <typeidx>` allocates a structure of type `$t` and initialises its fields with given values
- `struct.new $t : [t*] -> [(ref $t)]`
- iff `$t = struct (mut t)*`

* `struct.new_default <typeidx>` allocates a structure of type `$t` and initialises its fields with default values
- `struct.new_default $t : [] -> [(ref $t)]`
- iff `$t = struct (mut t)*`
- and all `t*` are defaultable

* `struct.get <typeidx> <fieldidx>` reads field `$x` from a structure
- `struct.get $t i : [(ref $t)] -> [t]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For languages with ubiquitous optrefs, get/set taking a ref will have a major codesize impact with all the casts. It seems like get/set could take an optref instead (with trap semantics) and it would just be a trivial local analysis that removes the null check when the operand had static type ref.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. But note that this relies on subsumption, i.e., subtyping being implicit. If we were to require an explicit instruction for upcasts like we have considered recently then you would have to inversely upcast every ref to optref -- or introduce all ref instructions in two variants, or introduce overloading.

- iff `$t = struct (mut1 t1)^i (mut ti) (mut2 t2)*`
- and `t = unpacked(ti)`

* `struct.set <typeidx> <fieldidx>` writes field `$x` of a structure
- `struct.set $t i : [(ref $t) ti] -> []`
- iff `$t = struct (mut1 t1)^i (var ti) (mut2 t2)*`
- and `t = unpacked(ti)`


#### Arrays

* `array.new <typeidx>` allocates an array of type `$t` and initialises its fields with a given value
- `array.new $t : [t i32] -> [(ref $t)]`
- iff `$t = array (mut t)`

* `array.new_default <typeidx>` allocates an array of type `$t` and initialises its fields with the default value
- `array.new_default $t : [i32] -> [(ref $t)]`
- iff `$t = array (mut t)`
- and `t` is defaultable

* `array.get <typeidx>` reads an element from an array
- `array.get $t : [(ref $t) i32] -> [t]`
- iff `$t = array (mut t')`
- and `t = unpacked(t')`

* `array.set <typeidx>` writes an element to an array
- `array.set $t : [(ref $t) i32 t] -> []`
- iff `$t = array (var t')`
- and `t = unpacked(t')`

* `array.len <typeidx>` inquires the length of an array
- `array.len $t : [(ref $t)] -> [i32]`
- iff `$t = array (mut t)`


#### Integer references

* `int31ref.new` creates an `int31ref` from a 32 bit value, truncating high bit
- `int31ref : [i32] -> [int31ref]`

* `int31ref.get_u` extracts the value, zero-extending
- `int31ref.get_u : [int31ref] -> [i32]`

* `int31ref.get_s` extracts the value, sign-extending
- `int31ref.get_s : [int31ref] -> [i32]`


#### Casts

* `ref.cast <reftype>` casts a value down to a given reference type
- `ref.cast t : [t'] -> [t]`
- iff `t <: t' <: anyref`
- traps if the operand is not of type `t` at runtime

Question: Have explicit runtime type representations as cast operands?


#### Local Bindings

* `let <valtype>* <blocktype> <instr>* end` locally binds operands to variables
- `let t* bt instr* end : [t* t1*] -> [t2*]`
- iff `bt = [t1*] -> [t2*]`
- and `instr* : bt` under a context with `locals` extended by `t*` and `labels` extended by `[t2*]`


## Binary Format

TODO.


## JS API

Based on the JS type reflection proposal.

### Type Representation

* A `ValueType` can be described by an object of the form `{ref: DefType}`
- `type ValueType = ... | {ref: DefType}`

* A `ValueType` can be described by the string `eqref`
- `type ValueType = ... | "eqref"`

* A `DefType` is described by a kind and a definition
- `type DefKind = "func" | "struct" | "array"`
- `type DefType = {kind: DefKind, definition: FuncType | StructType | ArrayType}`

* TODO: ...`StructType` and `ArrayType`...


### Value Conversions

#### Reference Types

* Any function that is an instance of `WebAssembly.Function` with type `<functype>` is a subtype of `ref <functype>`.

* TODO: ...rules for structure and array types.


#### Equality Types

* Any JS object (non-primitive value) or symbol or `null` can be passed as `eqref` to a Wasm function, stored in a global, or in a table.


### Constructors

#### `Global`

* `TypeError` is produced if the `Global` constructor is invoked without a value argument but a type that is not defaultable.

#### `Table`

* The `Table` constructor gets an additional optional argument `init` that is used to initialise the table slots. It defaults to `null`. A `TypeError` is produced if the argument is omitted and the table's element type is not defaultable.

#### `Type`

TODO.


## Questions

* Have explicit runtime type representations?

* Distinguish reference types that are castable (and therefore have RTTI)?

* Provide a way to make data types non-eq, especially immutable ones?