-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial overview #1
Conversation
This is a brain dump capturing various previous discussions and some of my notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable way to seed the proposal, thanks for writing it up!
proposals/gc/Overview.md
Outdated
Forming unions of different types, as value types. | ||
Defining, allocating, and indexing structures as extensions to imported types. | ||
Exceptions | ||
Direct support for strings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing bullets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could we add: "Admits (though does not mandate) efficient Ahead-of-Time compilation". Basically the goal of: you shouldn't have to do all the backflips JS engines do to get decent performance, and the compiled code should be cachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to next section.
proposals/gc/Overview.md
Outdated
|
||
### Efficiency Considerations | ||
|
||
Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is a remnant from previous notes, but just in case, can we drop mention of "managed wasm"? (I'd kindof like to avoid giving "wasm with GC features" a separate name, as if it were a separate mode, format or standard.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proposals/gc/Overview.md
Outdated
* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality. | ||
* Independent from linear memory. | ||
* Pay as you go. | ||
* Avoid generics or other complex type structure if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add bullet for "MVP then Iterate" (that is, we're not going to add all GC features for all GC languages at optimal perf in one shot)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
proposals/gc/Overview.md
Outdated
``` | ||
(func $D.g (param $Cthis (ref $C)) | ||
(local $this (ref $D)) | ||
(set_local $clos (call $outer (f64.const 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this $clos line intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, no, removed.
|
||
### Foreign References | ||
|
||
A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this subject, I also think we should have the ability to have a foreignref<T>
. This type will allow calls to imported Web API functions to also mention foreignref<T>
in their signature and the result will be really fast, cast-free calls from wasm into Web APIs. The "T" would just be, I think, just an arbitrary byte string and dynamic casts would be required to go between any foreign to foreign where T != U, and whether the cast succeeds is up to the embedder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, added with a slight difference: introduce definitions of "foreign types" to the type section, and simply allow forming regular references to them.
proposals/gc/Overview.md
Outdated
``` | ||
Store operators are only valid when targeting a mutable field or element. | ||
|
||
Immutability is needed to enable the safe and efficient [subtyping](#subtyping), especially as needed for the [objects](#objects-and-mehtod-tables) use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mehtod
=> method
in the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
proposals/gc/Overview.md
Outdated
|
||
A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap. | ||
|
||
There are no operations to manipulate foreign references, but by passing them as parameters or results of exorted Wasm functions, embedder references (such as DOM objects) can safely be stored in or round-trip through Wasm code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exorted
=> exported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(call_ref (i32.const 5) (get_local $x)) | ||
) | ||
``` | ||
Unlike `call_indirect`, this instruction is statically typed and does not involve any runtime check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these function references also be put in Tables (which type would be the function's type), for call_indirect
with no runtime checks? (the current call_indirect being a regression from asm.js which only had strictly-typed tables)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables already have an element type, anticipating other types of tables, e.g., we could easily support tables whose element type is a specific function type. However, that is orthogonal to anything described here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could have a new section at the end that just says "these types could be added as element types of tables, along with get_elem
/set_elem
ops"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't homogeneous tables simply have a plain function type as element type?
I can see that with function references we might want to add instructions to access tables directly, but that in turn seems independent of homogeneous tables.
Trying to unravel the dependencies. It seems that homogeneous function tables are unrelated to GC types. The only feature through which tables (both homegeneous and heterogeneous) might interact with this proposal would be the instructions you mention.
proposals/gc/Overview.md
Outdated
* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality. | ||
* Independent from linear memory. | ||
* Pay as you go. | ||
* Avoid generics or other complex type structure if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
proposals/gc/Overview.md
Outdated
Forming unions of different types, as value types. | ||
Defining, allocating, and indexing structures as extensions to imported types. | ||
Exceptions | ||
Direct support for strings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to next section.
proposals/gc/Overview.md
Outdated
|
||
### Efficiency Considerations | ||
|
||
Managed Wasm should inherit the efficiency properties of unmanaged Wasm as much as possible, namely: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proposals/gc/Overview.md
Outdated
``` | ||
(func $D.g (param $Cthis (ref $C)) | ||
(local $this (ref $D)) | ||
(set_local $clos (call $outer (f64.const 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, no, removed.
|
||
### Foreign References | ||
|
||
A new built-in value type called `foreignref` represents opaque pointers to objects on the _embedder_'s heap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, added with a slight difference: introduce definitions of "foreign types" to the type section, and simply allow forming regular references to them.
proposals/gc/Overview.md
Outdated
``` | ||
Being reference types, tagged integers can be casted into `anyref`, and can participate in runtime type dispatch with `cast_down`. | ||
|
||
TODO: To avoid portability hazards, the value range of `intref` has to be restricted to at most 31 bit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another design alternative for tagged integers.
) | ||
``` | ||
Structures are garbage-collected. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that structures can be compared for identity.
proposals/gc/Overview.md
Outdated
(cast_down <type1> <type2> $label (...)) | ||
``` | ||
also casts the operand of type `<type1>` to type `<type2>`. | ||
It is a validation error if the operand's type is not `<type1>`, or if `<type1>` is not a subtype of `<type2>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't type1
be a supertype of type2
? (looks like a copy-pasto)
Shouldn't type1 be a supertype of type2? (looks like a copy-pasto)
Yup, fixed.
|
proposals/gc/Overview.md
Outdated
``` | ||
All accesses are type-checked at validation time. | ||
|
||
Structures are [allocated](#allocation) with `new` instructions that take initialization values for each field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit: #allocation
doesn't seem to link to a section as of now)
Should we mention (somewhere, probably in the future #allocation
paragraph) that structs/arrays might not get allocated/garbage-collected, if the implementation can prove it doesn't to be allocated (through escape analysis and scalar replacement)? Maybe too much of an implementation detail at the moment. The current phrasing just made me wonder if it is a strong requirement that structures must get allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed stray xref. Yeah, probably not worth going into details about possible optimisations at this level. As usual, implementations can do whatever is semantically equivalent. Additional language should only be necessary if we explicitly want to forbid certain optimisations, e.g., like we did with TCO.
|
||
Packed fields require special load/store instructions: | ||
``` | ||
(load_field_packed_s $s $a (...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Can't the signedness of the field be inferred from the type descriptor?
- Can't packed be inferred from the type descriptor as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasm does not distinguish signed from unsigned integer types, so it is necessary here to specify sign extension behaviour, same as with existing instructions such as memory loads and stores. That also implies that at least the packed loads are different opcodes than non-packed loads. For symmetry and clarity it seems preferable to do the same for stores then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can i8 and i16 not be used on non-packed structs, and if so, why is that?
Regarding signedness, let's say we export a type to the JavaScript world. If we export i8, i16, and i32 fields without any signedness information, it is impossible to determine to sign-ext or zero-ext when loading a value from that field in JS. TypedObject have u8, u16, u32 for this reason: https://github.com/nikomatsakis/typed-objects-explainer/blob/master/core.md#primitive-type-definitions How should we expose this signedness information to the JS world?
|
||
* Only basic but general structure: tuples (structs) and arrays. | ||
* No heavyweight object model. | ||
* Accept minimal amount of dynamic overhead (checked casts) as price for simplicity/universality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but that's a cost paid only if you opt-in to GC, i.e. vanilla wasm code isn't affected, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Added as comment to the pay-as-you-go bullet
proposals/gc/Overview.md
Outdated
|
||
### Requirements | ||
|
||
* Allocation of structures on the heap which are garbage collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which heap? JS heap, or the wasm memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, the (newly introduced) Wasm heap -- which, in a JS embedding, will happen to be shared with JS.
Can i8 and i16 be used on non-packed structs, and if so, why is that?
There are no packed structs, only "packed" (sub-word-sized) fields. These
types can only be used for fields, they are not general value types.
Regarding signedness, let's say we export a type to the JavaScript world.
If we export i8, i16, and i32 fields without any signedness information, it
is impossible to determine to sign-ext or zero-ext when loading a value
from that field in JS. TypedObject have u8, u16, u32 for this reason:
https://github.com/nikomatsakis/typed-objects-
explainer/blob/master/core.md#primitive-type-definitions How should we
expose this signedness information to the JS world?
You are right, that can be a hassle. However, this problem isn't new, it
already exists with plain Wasm today: whenever you import/export a function
with an integer param/result from or to JavaScript, it is the
responsibility of the JS side to interpret (and potentially, convert) their
signedness correctly. At some point we discussed adding some form of
annotation to ease that.
In general, JS interop and API is a somewhat separate problem layer that
probably needs its own subdiscussion.
|
|
||
* all operations are very cheap, ideally constant time, | ||
* structures are contiguous, dense chunks of memory, | ||
* accessing fields are single-indirection loads and stores, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't tuple-typed fields need more than that (atomic access to avoid tearing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no access to tuple-typed fields. With nesting, you can get an interior reference to such a field, but you can only load/store its fields individually.
But I added a bullet to requirements about interaction with threads.
@@ -1 +1,716 @@ | |||
# GC Extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it about "GC"-ed languages or is it "JIT" rather?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the question. Why JIT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure myself anymore either. Somehow I probably misread something, and started thinking "JIT". Sorry.
|
||
Should attempt to implement 2-3 exemplary languages: | ||
|
||
* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we help you by implementing it for Kotlin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to hijack the review, but @bashor if you do get around to toying with a Kotlin-to-WASM compiler, I have some code at https://github.com/cretz/asmble/tree/master/src/main/kotlin/asmble/io that can save you a lot of work wrt parsing and what not (still in early stages and binary not tested well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have plans to compile Scala.js to wasm-with-GC (background info: I'm the author of Scala.js), as soon as some prototype interpreter is available. From there, it shouldn't take more than a 3-4 weeks, since we can reuse our entire pipeline except the very last piece (the so-called "emitter"), which for the compile-to-JS version is less than 4000 LoCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any implementation experiments will help evaluating the design, and any feedback from such experiments will be highly appreciated. The more the better! That said, we probably want to make sure that the set of experimental compilers includes some sufficiently "mainstream" languages for each category (for some definition of "mainstream").
Should attempt to implement 2-3 exemplary languages: | ||
|
||
* an object-oriented language (e.g., a subset of Java, with classes, inheritence, interfaces), | ||
* a typed functional language (e.g., a subset of ML, with closures, polymorphism, variant types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm the designer and lead developer of Elm, a simple ML-family language with all these features. Our primary compilation target is JS right now, so DOM interaction is a very practical consideration for us. Our community is very interested in WebAssembly, and I would like to help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
* Defining, allocating, and indexing structures as extensions to imported types? (future extension) | ||
* Exceptions (separate proposal) | ||
* Direct support for strings? (separate proposal) | ||
* Safe interaction with threads (sharing, atomic access) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it would also be nice to clarify no-tearing versus ordering guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but first things first. For starters, thread interaction is just a big TODO, see e.g. the section on Sharing. I anticipate more details being filled out eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was replying to a comment which now seems deleted.
(type $g-sig (func (param (ref $C)) (result i32))) | ||
(type $h-sig (func (param (ref $D)) (result i32))) | ||
|
||
(type $C (struct (ref $C-vt) (mut i32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my current implementation of JVM upon linear memory I store reference to class in first 4 bytes. Besides vtable, class contains information about object's size. GC uses this information to get object's size. How GC is intended to know size of a tuple? I guess, physically every tuple will have 4 or 8 bytes header with tuple type. And then goes reference to vtable, which is additionally 4 or 8 bytes. Would it be useful to support special kind of first field of a tuple to implement similar behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class is not just a vtable. In my current implementation I found following fields: flags (initalized, primitive), identifier, name, class of element (for array classes), arrayof class, reference to supertype function, superclass, reference to enum fields. Do you suggest to provide this information via special function which is present in every vtable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I just realized that this "vtable" structure is not necessarily a structure of function pointers. It can contain all these fields as-is.
(same (new $point ...) (new $point ...)) ;; false | ||
``` | ||
TODO: Could even allow heterogeneous equality (equality between operands of different type), but that might lead to some discontinueties or even prevent some potential optimizations? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about shallow copy of a structure (which is usually implemented by memcpy)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Not currently included, but may be a worthwhile extension if there turn out to be sufficiently many use cases of large structures that need to be copied by value.
``` | ||
|
||
Like structures, arrays can be compared for identity. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JVM, arrays are first-class objects that extend java.lang.Object
, support getClass()
, hashCode
, etc. In my implementations arrays are actually objects that have corresponding header, which points to class with vtable. How I can implement this with Wasm arrays? The only approach I can see is to generate additional wrapper class (two structures, one for data and one for vtable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the base proposal you indeed need to allocate two values, the "object" and the array backing store. With the extension for nesting and "flexible types" discussed below you could embed the latter into the former.
That would simplify creation of immutable objects, by first creating them as mutable, initialize them, and then cast away their constness. | ||
On the other hand, it means that immutable fields can still change, preventing various access optimizations. | ||
(Another alternative would be a three-state mutability algebra.) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, types are organized in a tree, which is insufficient for Java, where types form DAG. How are interfaces supposed to be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, structural subtyping can form arbitrary DAGs. Also, note that these are low-level representation types. They don't necessarily bear a direct relation to the source language type system.
In my current implementation of JVM upon linear memory I store reference
to class in first 4 bytes. Besides vtable, class contains information about
object's size. GC uses this information to get object's size. How GC is
intended to know size of a tuple? I guess, physically every tuple will have
4 or 8 bytes header with tuple type. And *then* goes reference to vtable,
which is additionally 4 or 8 bytes. Would it be useful to support special
kind of first field of a tuple to implement similar behaviour?
@konsoletyper, yes, this will most likely result in taking up two words for
each Java object. Coming up with a feature for Wasm that would make this
more space efficient while still being portable across all engines and
hardware platforms is a serious challenge. If anybody knows a satisfactory
solution to that problem then I'd be very interested in seeing it.
|
Is anybody opposed to landing this initial PR? |
This is a brain dump capturing various previous discussions and some of my notes. Lots more to be said and discussed, obviously, but we have to start somewhere.