Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial overview #1

Merged
merged 8 commits into from
Apr 27, 2017
Merged

Initial overview #1

merged 8 commits into from
Apr 27, 2017

Conversation

rossberg
Copy link
Member

This is a brain dump capturing various previous discussions and some of my notes. Lots more to be said and discussed, obviously, but we have to start somewhere.

This is a brain dump capturing various previous discussions and some of my notes.
Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable way to seed the proposal, thanks for writing it up!

Forming unions of different types, as value types.
Defining, allocating, and indexing structures as extensions to imported types.
Exceptions
Direct support for strings?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing bullets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could we add: "Admits (though does not mandate) efficient Ahead-of-Time compilation". Basically the goal of: you shouldn't have to do all the backflips JS engines do to get decent performance, and the compiled code should be cachable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to next section.


### Efficiency Considerations

Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is a remnant from previous notes, but just in case, can we drop mention of "managed wasm"? (I'd kindof like to avoid giving "wasm with GC features" a separate name, as if it were a separate mode, format or standard.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality.
* Independent from linear memory.
* Pay as you go.
* Avoid generics or other complex type structure if possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add bullet for "MVP then Iterate" (that is, we're not going to add all GC features for all GC languages at optimal perf in one shot)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```
(func $D.g (param $Cthis (ref $C))
(local $this (ref $D))
(set_local $clos (call $outer (f64.const 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this $clos line intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, no, removed.


### Foreign References

A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this subject, I also think we should have the ability to have a foreignref<T>. This type will allow calls to imported Web API functions to also mention foreignref<T> in their signature and the result will be really fast, cast-free calls from wasm into Web APIs. The "T" would just be, I think, just an arbitrary byte string and dynamic casts would be required to go between any foreign to foreign where T != U, and whether the cast succeeds is up to the embedder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, added with a slight difference: introduce definitions of "foreign types" to the type section, and simply allow forming regular references to them.

```
Store operators are only valid when targeting a mutable field or element.

Immutability is needed to enable the safe and efficient [subtyping](#subtyping), especially as needed for the [objects](#objects-and-mehtod-tables) use case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mehtod => method in the link

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.

There are no operations to manipulate foreign references, but by passing them as parameters or results of exorted Wasm functions, embedder references (such as DOM objects) can safely be stored in or round-trip through Wasm code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exorted => exported

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(call_ref (i32.const 5) (get_local $x))
)
```
Unlike `call_indirect`, this instruction is statically typed and does not involve any runtime check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these function references also be put in Tables (which type would be the function's type), for call_indirect with no runtime checks? (the current call_indirect being a regression from asm.js which only had strictly-typed tables)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables already have an element type, anticipating other types of tables, e.g., we could easily support tables whose element type is a specific function type. However, that is orthogonal to anything described here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could have a new section at the end that just says "these types could be added as element types of tables, along with get_elem/set_elem ops"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't homogeneous tables simply have a plain function type as element type?

I can see that with function references we might want to add instructions to access tables directly, but that in turn seems independent of homogeneous tables.

Trying to unravel the dependencies. It seems that homogeneous function tables are unrelated to GC types. The only feature through which tables (both homegeneous and heterogeneous) might interact with this proposal would be the instructions you mention.

* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality.
* Independent from linear memory.
* Pay as you go.
* Avoid generics or other complex type structure if possible.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Forming unions of different types, as value types.
Defining, allocating, and indexing structures as extensions to imported types.
Exceptions
Direct support for strings?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to next section.


### Efficiency Considerations

Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```
(func $D.g (param $Cthis (ref $C))
(local $this (ref $D))
(set_local $clos (call $outer (f64.const 1)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, no, removed.


### Foreign References

A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, added with a slight difference: introduce definitions of "foreign types" to the type section, and simply allow forming regular references to them.

```
Being reference types, tagged integers can be casted into `anyref`, and can participate in runtime type dispatch with `cast_down`.

TODO: To avoid portability hazards, the value range of `intref` has to be restricted to at most 31 bit?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another design alternative for tagged integers.

)
```
Structures are garbage-collected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that structures can be compared for identity.

(cast_down <type1> <type2> $label (...))
```
also casts the operand of type `<type1>` to type `<type2>`.
It is a validation error if the operand's type is not `<type1>`, or if `<type1>` is not a subtype of `<type2>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't type1 be a supertype of type2? (looks like a copy-pasto)

@rossberg
Copy link
Member Author

rossberg commented Apr 11, 2017 via email

```
All accesses are type-checked at validation time.

Structures are [allocated](#allocation) with `new` instructions that take initialization values for each field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit: #allocation doesn't seem to link to a section as of now)

Should we mention (somewhere, probably in the future #allocation paragraph) that structs/arrays might not get allocated/garbage-collected, if the implementation can prove it doesn't to be allocated (through escape analysis and scalar replacement)? Maybe too much of an implementation detail at the moment. The current phrasing just made me wonder if it is a strong requirement that structures must get allocated.

Copy link
Member Author

@rossberg rossberg Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed stray xref. Yeah, probably not worth going into details about possible optimisations at this level. As usual, implementations can do whatever is semantically equivalent. Additional language should only be necessary if we explicitly want to forbid certain optimisations, e.g., like we did with TCO.


Packed fields require special load/store instructions:
```
(load_field_packed_s $s $a (...))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Can't the signedness of the field be inferred from the type descriptor?
  2. Can't packed be inferred from the type descriptor as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasm does not distinguish signed from unsigned integer types, so it is necessary here to specify sign extension behaviour, same as with existing instructions such as memory loads and stores. That also implies that at least the packed loads are different opcodes than non-packed loads. For symmetry and clarity it seems preferable to do the same for stores then.

Copy link

@smvv smvv Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can i8 and i16 not be used on non-packed structs, and if so, why is that?

Regarding signedness, let's say we export a type to the JavaScript world. If we export i8, i16, and i32 fields without any signedness information, it is impossible to determine to sign-ext or zero-ext when loading a value from that field in JS. TypedObject have u8, u16, u32 for this reason: https://github.com/nikomatsakis/typed-objects-explainer/blob/master/core.md#primitive-type-definitions How should we expose this signedness information to the JS world?


* Only basic but general structure: tuples (structs) and arrays.
* No heavyweight object model.
* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but that's a cost paid only if you opt-in to GC, i.e. vanilla wasm code isn't affected, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Added as comment to the pay-as-you-go bullet


### Requirements

* Allocation of structures on the heap which are garbage collected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which heap? JS heap, or the wasm memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, the (newly introduced) Wasm heap -- which, in a JS embedding, will happen to be shared with JS.

@rossberg
Copy link
Member Author

rossberg commented Apr 11, 2017 via email


* all operations are very cheap, ideally constant time,
* structures are contiguous, dense chunks of memory,
* accessing fields are single-indirection loads and stores,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't tuple-typed fields need more than that (atomic access to avoid tearing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no access to tuple-typed fields. With nesting, you can get an interior reference to such a field, but you can only load/store its fields individually.

But I added a bullet to requirements about interaction with threads.

@@ -1 +1,716 @@
# GC Extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it about "GC"-ed languages or is it "JIT" rather?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question. Why JIT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure myself anymore either. Somehow I probably misread something, and started thinking "JIT". Sorry.


Should attempt to implement 2-3 exemplary languages:

* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we help you by implementing it for Kotlin?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to hijack the review, but @bashor if you do get around to toying with a Kotlin-to-WASM compiler, I have some code at https://github.com/cretz/asmble/tree/master/src/main/kotlin/asmble/io that can save you a lot of work wrt parsing and what not (still in early stages and binary not tested well).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have plans to compile Scala.js to wasm-with-GC (background info: I'm the author of Scala.js), as soon as some prototype interpreter is available. From there, it shouldn't take more than a 3-4 weeks, since we can reuse our entire pipeline except the very last piece (the so-called "emitter"), which for the compile-to-JS version is less than 4000 LoCs.

Copy link
Member Author

@rossberg rossberg Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any implementation experiments will help evaluating the design, and any feedback from such experiments will be highly appreciated. The more the better! That said, we probably want to make sure that the set of experimental compilers includes some sufficiently "mainstream" languages for each category (for some definition of "mainstream").

Should attempt to implement 2-3 exemplary languages:

* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces),
* a typed functional language (e.g., a subset of ML, with closures, polymorphism, variant types)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm the designer and lead developer of Elm, a simple ML-family language with all these features. Our primary compilation target is JS right now, so DOM interaction is a very practical consideration for us. Our community is very interested in WebAssembly, and I would like to help with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

* Defining, allocating, and indexing structures as extensions to imported types? (future extension)
* Exceptions (separate proposal)
* Direct support for strings? (separate proposal)
* Safe interaction with threads (sharing, atomic access)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it would also be nice to clarify no-tearing versus ordering guarantees.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but first things first. For starters, thread interaction is just a big TODO, see e.g. the section on Sharing. I anticipate more details being filled out eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was replying to a comment which now seems deleted.

(type $g-sig (func (param (ref $C)) (result i32)))
(type $h-sig (func (param (ref $D)) (result i32)))

(type $C (struct (ref $C-vt) (mut i32))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my current implementation of JVM upon linear memory I store reference to class in first 4 bytes. Besides vtable, class contains information about object's size. GC uses this information to get object's size. How GC is intended to know size of a tuple? I guess, physically every tuple will have 4 or 8 bytes header with tuple type. And then goes reference to vtable, which is additionally 4 or 8 bytes. Would it be useful to support special kind of first field of a tuple to implement similar behaviour?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class is not just a vtable. In my current implementation I found following fields: flags (initalized, primitive), identifier, name, class of element (for array classes), arrayof class, reference to supertype function, superclass, reference to enum fields. Do you suggest to provide this information via special function which is present in every vtable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just realized that this "vtable" structure is not necessarily a structure of function pointers. It can contain all these fields as-is.

(same (new $point ...) (new $point ...)) ;; false
```
TODO: Could even allow heterogeneous equality (equality between operands of different type), but that might lead to some discontinueties or even prevent some potential optimizations?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about shallow copy of a structure (which is usually implemented by memcpy)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Not currently included, but may be a worthwhile extension if there turn out to be sufficiently many use cases of large structures that need to be copied by value.

```

Like structures, arrays can be compared for identity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JVM, arrays are first-class objects that extend java.lang.Object, support getClass(), hashCode, etc. In my implementations arrays are actually objects that have corresponding header, which points to class with vtable. How I can implement this with Wasm arrays? The only approach I can see is to generate additional wrapper class (two structures, one for data and one for vtable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the base proposal you indeed need to allocate two values, the "object" and the array backing store. With the extension for nesting and "flexible types" discussed below you could embed the latter into the former.

That would simplify creation of immutable objects, by first creating them as mutable, initialize them, and then cast away their constness.
On the other hand, it means that immutable fields can still change, preventing various access optimizations.
(Another alternative would be a three-state mutability algebra.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, types are organized in a tree, which is insufficient for Java, where types form DAG. How are interfaces supposed to be implemented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, structural subtyping can form arbitrary DAGs. Also, note that these are low-level representation types. They don't necessarily bear a direct relation to the source language type system.

@rossberg
Copy link
Member Author

rossberg commented Apr 19, 2017 via email

@rossberg
Copy link
Member Author

Is anybody opposed to landing this initial PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.