-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First go at MVP proposal #34
Conversation
(Not @lukewagner, but hope you don't mind an unsolicited review!) The MVP looks nice, and seems very reasonable both to target and to implement. It seems complete enough to express most things with reasonable efficiency, although it's missing a nice means of packaging up data with code pointers that know the data's type (for both ML-style closures and Java-style single dispatch OO). That might be a post-MVP feature, though! The only bit that looks worrying in the current proposal is
I would much prefer the runtime casts to be based on explicit tags. Here's an attempt at a proposal: Add to modules a section containing a list of "tag definitions", which are given a type:
Change
The tag's type must be a supertype of the structure type, but need not be exactly the same. Programs that don't care about downcasts can just pass some tag of type Then,
and traps if the operand does not have tag The implementation can represent tags as distinct integers, and then the implementation of Also, private types can be implemented by not exporting a tag definition. (Roughly, this is like having structural subtyping with nominal downcasts). Finally, a couple of minor notes on the subtyping relation:
|
I think you are right, and in fact I have been thinking along a similar direction -- thus the cryptic question about distinguishing "castable types" and RTTI at the very end of the doc. However, I think that static tags like you suggest would be too inflexible, at least if there also was interesting polymorphism. What I thought was dynamic "type representation" values like in some literature on type erasure -- essentially your tags but first-class. Also, I was thinking that it might make sense to make the runtime type information on objects themselves completely optional. So my idea was:
Right. To my excuse, I added this sentence in a hurry. :)
Ah, good point, the doc does not currently say anything about that. The idea was that the users can somehow explicitly choose to forbid eq for types they define, orthogonal to mutability. But I haven't thought this through yet, i.e., I don't have an idea yet what the right way to declare this would be.
Yes, I should make that more explicit. Thanks for the comments! |
I agree completely. I think static tags are enough to replace the downcasts supported by
Right, OK. Is this for efficiency? I was assuming that implementations would have a type tag on every object anyway (in order to scan it during GC), but maybe there's another approach where values without RTTI can be more efficiently implemented?
I don't think that mutability and |
Hm, I think it will be difficult (or at least ugly) to change later, so it'd better to start right.
The information needed for GC might be more low-level, so simpler. I could imagine some implementation schemes where you need almost none of it, though that may not be the most efficient. But it's a fair question, I'm not sure whether it is worth supporting.
I fully agree that they aren't orthogonal semantically, but in terms of features it makes sense to allow combining them in all ways. For example, you sometimes want fast equality checks for immutable data structures. Likewise, you might want to hide equality for mutable data, e.g., when you already hide the type's representation (which may be a future extension). |
Great start! Couple questions: Is there any way to apply resource limits to GC'd objects created by a WebAssembly module in this design? I'm researching using WebAssembly as a plugin sandbox for web apps, and the various ways user-supplied code could be adversarial are of interest. :) Linear memory can be fixed to a maximum size at compile time, but I don't see any way here to control the amount of memory used by reference types: as with JavaScript, you could allocate a large number of arrays and use multiple gigabytes until the system goes into swap hell or crashes. Additionally, I can see some use for finalizers which I don't see here; being able to reference heap data in linear memory from a GC'd struct, and then being able to free that data when the GC handle dies, would be very useful. Certainly for things like emscripten's 'embind' C++ bindings, I would love to not have to manually call a delete method on the exposed JS bindings. Is that something that might come later? |
@Brion, good questions. Unfortunately, I don't have good answers. We haven't really talked about ways of limiting resource usage of a Wasm engine. I can't claim to have a good answer, but I assume that would most likely happen at the embedding level? As for finalisers: yes, fairly far down on the future feature list. It's probably the toughest problem of everything related to GC. There simply are too many competing semantics for such mechanisms out there, and we don't know yet how to reduce that zoo to some reasonably generic (if slow) set of primitives that can be implemented in all engines and is useful for all languages. |
@rossberg yeah the tricky part of the resource limit use case is that I'm targeting plugins running in a web app, so the app doesn't control the browser's Wasm engine... the GC would likely treat the plugin module's objects as belonging to the web app if it applies any limits at all. It may be that I just have to work within linear memory and roll my own internal GC for this use case, which should be fine for the narrow API I'm envisioning between the module and the host web app. Or assume a little more trust with a curated plugin list and some way to manually remove a troublesome plugin. I'll keep an eye out for future discussions on finalizers. Thanks! |
@Brion, what I meant is that this could be some functionality in the embedding API, such as the JS one. As long as you can call JavaScript you could access it. But we haven't discussed anything like that yet, and I don't have a clear idea what the right design would be (and what could be implemented in current engines). It is worth noting that on browsers memory usage is capped anyway. This is a limitation that most browsers already put in place for JavaScript, and it will simply be inherited by Wasm. It is not a threshold you can control, but it certainly will stop any app long before it can decline into swapping hell. |
@rossberg hmm, there may be some differences between engines. Allocating typed arrays and writing some memory in them to make sure they're really allocated, Firefox lets me allocate at least 64 gigabytes of data on a 16GB MacBook Pro, using tons of swap and compressed memory: On my Linux PC with a spinning disk, the same runs for a few gigabytes and then the system becomes unresponsive, requiring a reboot. Safari cuts it off around 16 GB of allocations, with a warning that the page is using a lot of memory, so that's not too bad. Haven't tested Chrome or Edge. I'll add an issue to propose an addition for per-module memory limits and continue over there. :) |
I'm a bit unclear on the semantics of the intref type:
|
@Brion, interesting, I wasn't aware that FF would allow that. Pretty sure Chrome caps at a gig or so, though there might be special rules for array buffers. The exact semantics of intref is still TBD, thus the question mark. The choices boil down to:
1 is generally more efficient (no implicit branches). And a producer can build 2 out of 1, but not the other way round. However, 2 enables engines to unbox even high 32 bit values on 64 bit systems, which a producer cannot do themselves given only 1. @lukewagner and I have been discussing it recently, and I think we ended up tending towards starting with int31ref for the MVP. But this will take implementation and producer feedback to finalise. An int31ref could potentially be made an eqref, the other sizes couldn't. However, it may be more portable (and more uniform) to not allow eq for any of them. It should be easy for engines to optimise the untag-compare combination anyway. |
Pushed some refinements on intrefs, settling on int31 for now. |
@rossberg thanks, that's clearer! Should be possible to build on int31ref to avoid boxing the 31-bit subset of integers in a universal type representation per the note in Overview.md; JavaScript style types might look like:
Makes reasonable sense to me. :) I'd prefer full int32ref, but understand that could be more painful on 32-bit arch. (I'm assuming int31ref on 32-bit arch is envisioned as a 32-bit word with a tag in the lowest bit and a bit-shift to get the stored value?) |
Looking in more detail at how to implement, say, equality checking with a JS-style universal type representation, I'm a bit lost. The If there was a ref.is_castable operator, might look something like this? (func $equal_check (param $a anyref) (param $b anyref)
(if
(i32.and
(ref.is_castable eqref (get_local $a))
(ref.is_castable eqref (get_local $b))
)
(block
;; Check object identity first.
(if
(ref.equal
(ref.cast eqref (get_local $a))
(ref.cast eqref (get_local $b))
)
(block
(return (i32.const 1))
)
)
)
;; If not the same object, may still be a boxed float or a string
;; so cast it down and make a method call for the comparison.
;; ... todo ...
)
;; Not an eqref? Probably an int31ref.
(if
(i32.and
(ref.is_castable int31ref (get_local $a))
(ref.is_castable int31ref (get_local $b))
)
(block
(return
(i32.eq
(int31ref.get_s (ref.cast int31ref $a))
(int31ref.get_s (ref.cast int31ref $b))
)
)
)
)
;; Not equal.
(return (int32.const 0))
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! Sorry for taking so long to review, but I had been thinking and reading about the dynamic cast
/nominal/structural interaction which, quite coincidentally, you and @stedolan have already been discussing. This ultimately led me to the same conclusion of "structural static"/"nominal dynamic" so I'm a big fan of explicit tags and want to pursue that direction more.
proposals/gc/MVP.md
Outdated
|
||
* `deftype` is the new category of types that can occur as type definitions | ||
- `deftype ::= <functype> | <structtype> | <arraytype>` | ||
- `module ::= {..., types vec(<deftype>)}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a types section containing functype
s which I think we're extending with other constructor "forms", right? (The wording sounds like this is a new section, so just checking.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deftype
is the syntax of type definitions, which previously was just function types. (I'm avoiding saying "type section" here since in my mind, sections are rather an encoding detail of the binary format.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense. Perhaps tweak wording to say "generalizes the existing types component of modules"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
* `type <typetype>` is an import description with an upper bound | ||
- `importdesc ::= ... | type <reftype>` | ||
- Note: `type` may get additional parameters in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the reasoning behind having the type have a reftype
as opposed to a deftype
? E.g., into which index space does a type import belong? It seems like it should be in the type definition index space (so it can be used in function types, exports, etc). I was imagining that maybe, as a special case (to play well with section ordering), type imports would go into the type section (using a special import form
followed by the type constructor's form
). Then a type import would be prepended to both the type definition index space and the import index space. Kinda weird, but seems forced by the type/import circularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bound is a reftype is that you need to be able to use anyref (or eqref) as a bound.
Yes, type imports are in the type index space. Lumping them in with the type section rather than with imports would be inconsistent with how we handle other imports. It would create a mess with both the type index space (which should have imports first) and the "argument list" of a module instantiation (which currently simply reflects the import list). If we want to separate type imports from other types than I think we instead should have a separate type import section. But I don't think it's strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, still not sure reftype makes sense as the bound, but I guess I'll wait to see what you propose for nominal types + type imports. In particular, it seems like the import needs to introduce a new type definition; it can't just reference an existing one.
Ah ok, so then type imports are appended to the type section so that (1) signatures can refer to type imports by just using the appropriate index (2) the type section can only be validated together with the import section. Yes? It'd be good to mention these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes, except that type imports are still meant to be prepended to the type index space. (The ordering of binary sections doesn't have to imply the ordering of index spaces. As you said, they need to be validated together anyway.)
Added comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the bound: In most cases, it will just be anyref
, i.e., the type is fully abstract. How would you express that if it wasn't a reftype?
FWIW, I'm not sure I understood what you meant by:
In particular, it seems like the import needs to introduce a new type definition; it can't just reference an existing one.
Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I understand the proposal a bit better now. I also see the benefit in that, by using a reference-type bound (instead of using a struct/array bound more-directly), one has the more-expressive capability of declaring whether the import is equality-comparable, nullable, etc.
proposals/gc/MVP.md
Outdated
|
||
* `eqref` is a subtype of `anyref` | ||
- `eqref <: anyref` | ||
- Note: `int31ref` and `anyfunc` are *not* a subtypes of `eqref`, i.e., those types do not expose reference equality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anyfunc
in tables is already nullable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't disagree, but how does that relate to this item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I put this comment on the wrong line: I meant to comment on the below nullref
is not a subtype of anyfunc
line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yes, typo.
|
||
* `ref.func` creates a function reference from a function index | ||
- `ref.func $f : [] -> (ref $t)` | ||
- iff `$f : $t` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had been on the edge of whether function references were needed for the GC MVP, but the previous discussion about passing closures to and from JS without wrapping each time seemed to boost the priority. Given that, could we also have the bind
operator which can be used to manufacture closures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but closures are different from mere function references, since they have a different runtime representation. Hence they imply a new type, so I considered them post-MVP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had assumed that they would have the same type and that they would also have the same representation. That is, even without a bound dynamic environment, ref.func
is still a closure because it entrains the WebAssembly.Instance
. We represent these as JSFunction
objects internally in SM. I had entertained the ideal of representing function references as raw function pointers, but I think this gets pretty hairy when GC and multi-instance linking enters the picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it to the question section.
- `structtype ::= struct <fieldtype>*` | ||
|
||
* `arraytype` describes an array with dynamically indexed fields | ||
- `arraytype ::= array <fieldtype>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the older idea of giving structtype
s a trailing, dynamic-length array tail? I'd be surprised if multiple source-languages didn't need to store one or two extra bookkeeping fields to implement their source-language array, so it feels like a GC MVP thing and it seems like, with that ability, a separate arraytype
isn't even needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this would really extend language's abilities to find efficient representations for their data types. Engines can still treat structs without a tail specially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enabled by allowing nesting of aggregates (structs/arrays), which is currently left for post-MVP. For MVP it seems okay to use an indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
|
||
* A reference value type is defaultable if it is not of the form `ref $t` | ||
|
||
* Locals must have a type that is defaultable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without adding a nullable
type constructor, it seems like any source language with ubiquitous nullable types will be forced to use tons of otherwise-unnecessary anyref
s and cast
s. Nullability seems like an important thing not to bolt on later too, so can we add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be added, but there is a risk though that people will be lazy and default to using optref
instead of anyref
since it gives "easier" interop between all sorts of languages, which will make languages that are null-safe pay the price for having nulls. A world where non-null is default and languages that are liberal with null pay the price makes more sense to me, and will promote more robust software ecosystems built on wasm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on the fence here. The main benefit of a nullable ref type over anyref is that a downcast to the non-nullable concrete ref type is gonna be slightly cheaper. But you'd still need it. (Or are you suggesting that all ref instructions should also work with nullable refs directly?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A downcast from a nullref $t
to ref $t
is going to be significantly cheaper than a downcast from anyref
; the former is a single null check; the latter several instructions and memory access (several, if TLS is used to avoid baking in pointers to the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, added optref
(for a short name). (I abandoned the idea for a separate nullable
type constructor because it actually introduces more anomalies than benefits.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh: perhaps also mention table element kinds (or maybe this is solved by validation constraint on table.grow
and adding a new table.grow_init
that takes a value to initialize with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think table.grow
should take an initialisation operand right away, no point in having two separate instructions there. (I just realised that this is instruction is not currently part of any proposal, so I opened an issue for the reference types proposal.)
But there is another constraint we need: a table definition of non-zero initial size must have a defaultable element type. Added that.
@rossberg @stedolan What about we start with having the type tags be a new definition kind (like @stedolan first suggested) and then support the use cases wanting first-class tags with the definition reference types extension? This is good because it allows the type tags to be imported (like any definition kind) which has multiple benefits. Also: should the type tags we're talking about here be unified with those being discussed for exceptions? |
I'm slowly catching up with this discussion. Here are some comments, from the point of view of compiling Scala.js to Wasm.
I would advocate for nullable ref types As already mentioned by @lukewagner, completely removing the ability to have nullable references other than the completely untyped I suggest the addition of another type constructor
I would really like to see an Yet another way to look at this would be to add casts In the current text, the constraint interface A {}
interface B {}
class X extends Object implements A, B {}
class Y extends Object implements A, B {}
A a = ...;
B b = (B) a; the last instruction may succeed, but there is no way to encode it as the proposed A a = ...;
Object a1 = a;
B b = (B) b; but that's just silly. I suggest dropping the constraint altogether. I haven't examined the tag-based approach enough to determine whether it allows for such things. |
proposals/gc/MVP.md
Outdated
- `reftype ::= ... | ref <typeidx>` | ||
- `ref $t ok` iff `$t` is defined in the context | ||
|
||
* `int31ref` is a new reference type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am sure this has its uses, and is nice to have from a "why not" perspective, I feel this is such an implementation specific optimization (it will not apply to languages that guarantee 32-bits integers, and it may not be a great fit for certain engines/hosts either, if they e.g. need more than 1 bit) that I'd think we'd be better off not having it, or guarantee the full 32-bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @stedolan said. Don't think of this as a source-level type, it's just a way to represent tagged pointers/integers, which is a mechanism that many languages need. 32 bit integer references can be implemented in user space using this primitive. See also my recent comment on the respective issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what comment of @stedolan you are referring to, I don't see him refer to int31ref
at all?
I see "many languages" needing 32-bit, and 31-bit being rather specific to some language implementations. I don't see how wasm as a whole benefits from this bias, especially since there will be some cost to supporting it (there will be places where this bit needs to be tested if integer values are a possibility). The alternative, if int32ref
is not practical, is to not have integers part of reftype
at all, since at least that would give engines the benefit of being able to assume a reftype
is always a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to this comment.
To clarify, int31ref is primarily a primitive for making pointer tagging available, in the most general form that we can still guarantee to be directly implementable across all relevant platforms and engines. Such tagging is a mechanism that probably a majority of GC'ed language implementations rely on in some form (almost all dynamic languages, functional languages, logic languages). The GC proposal will effectively be useless to them without it. While certainly limited, the proposed type hopefully is general enough that a rather large subset of these use cases can be mapped to it directly.
Int32ref is in a different category. It is not applicable to the same use cases, since it can involve unpredictable and substantial hidden cost on some engines (allocation and branching in particular). For example, in V8 on 32 bit platforms ;). Also, an int32ref can already be implemented with the current proposal, in at least two ways: either (1) as a ref to an immutable struct with a single int32 field (which an engine should be able to optimise just like a proper int32ref), or (2) as the union of an int31ref and the aforementioned struct, doing the necessary boxing/unboxing of large values in user space (in case you don't trust the engine to do a good enough job).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about cost to languages that don't use it? The mere possibility of int31ref values being present may require engines to emit code that test the bit, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only at the point where they perform a downcast from anyref
, which is an explicit operation and which in practice will require that check anyway, because existing engines tend to use tagging already (like V8).
- `structtype ::= struct <fieldtype>*` | ||
|
||
* `arraytype` describes an array with dynamically indexed fields | ||
- `arraytype ::= array <fieldtype>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this would really extend language's abilities to find efficient representations for their data types. Engines can still treat structs without a tail specially.
- `arraytype ::= array <fieldtype>` | ||
|
||
* `fieldtype` describes a struct or array field and whether it is mutable | ||
- `fieldtype ::= <mutability> <storagetype>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing definition of mutability
?
A field can also be a reftype
?
A field can be reftype
that is stored in-line? Or is that GC v2? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutability is in the spec already. A field can be a valtype, which includes reftypes per the reference types proposal. Nested aggregates are post-MVP, since they imply introducing inner pointers.
proposals/gc/MVP.md
Outdated
- `eqref <: anyref` | ||
- Note: `int31ref` and `anyfunc` are *not* a subtypes of `eqref`, i.e., those types do not expose reference equality | ||
|
||
* `nullref` is a subtype of `eqref` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a hint somewhere (if not here, then in the GC proposal) why we need nullref
at all, since none of the current types are nullable. Is this just to be able to type locals between function start and first assignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from the reference types proposal. Anyref and anyfunc are nullable, so before you have cast down to a concrete ref type successfully it could still be null.
|
||
* A reference value type is defaultable if it is not of the form `ref $t` | ||
|
||
* Locals must have a type that is defaultable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be added, but there is a risk though that people will be lazy and default to using optref
instead of anyref
since it gives "easier" interop between all sorts of languages, which will make languages that are null-safe pay the price for having nulls. A world where non-null is default and languages that are liberal with null pay the price makes more sense to me, and will promote more robust software ecosystems built on wasm.
@sjrd @aardappel However, it's fairly common for dynamic language runtimes to use 31-bit ints to represent small integers, to avoid having to allocate on every arithmetic operation. Ruby does this, as does PyPy (optionally, although standard CPython does not do this and uses a separate (cached) allocation for every integer value) and Racket. Several Prolog implementations do this as well (e.g. SICStus uses 29-bit/61-bit ints, and SWI-Prolog uses 25/57-bit ints). All of these languages do not expose the word size at source level: they support arbitrary-precision arithmetic, but use tagged integers as an optimisation for small ints. |
That is not quite correct. See my other comments for why int31ref is not just an integer type but a more low-level primitive for pointer tagging. There are plenty of language implementations that use tagged pointers/integers of some form, not just Ocaml. Stephen enumerated a few -- practically all impls using a uniform representation. In all these cases, 32 bit ints with their possibility of hidden allocation and indirection and branching could be significantly more costly. Ultimately, I think we want both, but for the MVP, the lower-level primitive seems more relevant than int32ref (whose only benefit is better performance on 64 bit).
All reasonable, and I agree that we want such a type. I was just leaving it out to avoid MVP feature creep, see my reply to @lukewagner. But happy to discuss this further if folks think it is essential to have it now.
This topic and these very options are being discussed in the context of the reference types proposal (and quite extensively offline between Luke and me). I think we almost have concluded to go with the latter.
Hm, be careful not to confuse levels. The Wasm casts are very much a low-level mechanism concerned with concrete low-level (structural a.t.m.) representations. On that level, a sideways cast never makes sense, AFAICS, as it cannot possibly succeed. I don't think a source-level cast between interfaces like in your example will map directly to a Wasm-level cast like you seem to suggest. What Wasm representation for Java interfaces do you have in mind? |
With a little hesitation: I think if we want to do tagged pointers we should have a broader discussion about concrete needs, and not just assume that 31-bit ints with the high bit discarded on boxing is the sweet spot. Though Racket evidently uses 31-bit fixnums, neither Chez Scheme nor Larceny (both native Scheme implementations) do, preferring instead to use three-bit tags with low tagging where tags 000 and 100 are fixnum values (ie, they have 30-bit signed fixnums) and arithmetic can be performed directly on tagged values. For some dynamic languages, not needing an indirection to access an object's major type class is important. Also see comments about Prologs above. The int31ref design opens up a path for tagged pointers but is IMO strongly biased in favor of statically typed ones. (There was a longish discussion about tagging schemes here: WebAssembly/design#919, where I proposed a boxing scheme that allowed for more tag bits and more efficient boxing and unboxing. I'm not saying that that is fully baked, but it represents a different view.) |
I agree that im/export is essential. I was assuming that the type rep instructions can be regarded as constant instructions, so could be used in global definitions, which would also allow them to be imp/exported. I strongly believe that declaration-based type tags are gonna be too restrictive long-term.
I believe they are different mechanisms, which is why I avoided the term "tag". For example, exceptions don't have subtyping (although they could). OTOH, they could have return types and other attributes having to do with control flow. Also, wasn't it a stated goal to keep the exception proposal independent of the GC extensions? |
@lars-t-hansen, yes, I think a more general tagging mechanism would be very useful. I still think this would need to have the form of proper variant types to be sufficiently hardware-indepencent. However, I don't think any such mechanism will be able to provide the same performance and portability guarantee, i.e., anything but a singe tag bit might require an indirection and will force specific implementation schemes on engines. In that sense, while somewhat odd-looking, int31ref is much simpler and has reliable performance. It hence isn't subsumed by a general tagging mechanism, AFAICS. It is making a different trade-off between generality and reliable performance. Also note that producers are free to use more integer bits for their own tagging purposes. They just cannot have more pointer bits, because that's not portably achievable. (For one, because existing engines already differ in the polarity of their tagging scheme.) |
I did a little more prototyping using an int31ref-like scheme for a JS-like language. While it works nicely for in-range integers the overhead of boxing onto heap objects is really high, so if > 31 bits are needed things slow down fast. (For instance a mandelbrot fractal calculation is 10x slower with f64s boxed compared to using 64-bit NaN-boxing.) If a float64ref isn't available, I'd have to consider a fat-pointer scheme pairing a NaN-boxed value in an i64 (for i32 and f64 values) and a separate reference (for objects and specials). That's 128 bits per value, and I have to manage the pairing -- as two arguments, as struct tuples in arrays, and ... either require the multival proposal for return values or do some kind of out-value struct. I do though understand that anything over 31 bits is tough to guarantee across different architectures and embeddings. (NaN boxing as we know it might not survive future increases of native address space beyond 48 bits anyway!) |
(Probably it's a good idea to fork off a separate issue for |
|
||
* Any function reference type is a subtype of `anyfunc` | ||
- `ref $t <: anyfunc` | ||
- iff `$t = <functype>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also optref $t
, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is implied by the previous bullet.
proposals/gc/MVP.md
Outdated
- and all `t*` are defaultable | ||
|
||
* `struct.get <typeidx> <fieldidx>` reads field `$x` from a structure | ||
- `struct.get $t i : [(ref $t)] -> [t]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For languages with ubiquitous optref
s, get
/set
taking a ref
will have a major codesize impact with all the cast
s. It seems like get
/set
could take an optref
instead (with trap semantics) and it would just be a trivial local analysis that removes the null check when the operand had static type ref
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. But note that this relies on subsumption, i.e., subtyping being implicit. If we were to require an explicit instruction for upcasts like we have considered recently then you would have to inversely upcast every ref to optref -- or introduce all ref instructions in two variants, or introduce overloading.
It would seem that int32ref could be implemented on a 32-bit platform as two i32 values, not unlike how i64 is often implemented on 32-bit platforms. To be sure, this would take extra registers and have a cost, but it wouldn't be the cost of allocation and branching. On the other hand it would also avoid the need for 31-bit overflow checking. As an unrelated question, should |
- `struct.get $t i : [(optref $t)] -> [t]` | ||
- iff `$t = struct (mut1 t1)^i (mut ti) (mut2 t2)*` | ||
- and `t = unpacked(ti)` | ||
- traps on `null` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me: the static type of the referred-to object appears to be restricted to $t exactly, but there is no widening / upcast operator that I can see. Are you expecting widening casts to be handled by LET in some fashion or by changes to type compatibility for eg SET_LOCAL? Or is it just a matter of the missing prose in the section on Value Conversions?
I'd also be curious to hear about why you feel is a desirable operand here, since the verifier will know the static type of the operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes the usual declarative style of having a separate subsumption rule. For the Wasm type system it would be built into the sequencing rule (see the spec draft for the reference types proposal for details; concretely, the last rule in this subsection). With this approach, individual rules can always "assume" the correct type.
Your second question seems to be missing a word? Did you mean the type immediate? A stated design criterion for Wasm was to always have all operationally relevant type information explicit in the instructions (except where the semantics is completely parametric in a type), e.g. no overloading. So this is just following that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's fine. So the subtyping rules a little further up here really extend the `match rule there.
(For the second para I did indeed mean the type immediate; github helpfully removed a word in angle brackets from my comment.)
@sunfishcode, the point of having int refs is that they are freely interchangeable with regular references, e.g. in the anyref type. That requires that they are no larger than pointers. (Unless we want to blow up all references to 2 words on 32 bit.) But you are absolutely right about the naming, changed! |
Are you referring to specific implementations, or are there present or anticipated GC features that would require/oblige all implementations to work this way? |
@sunfishcode, there isn't too much leeway wrt implementation techniques when e.g. a structure field of type anyref must be able to hold both a regular reference and an int ref without extra boxing. |
Then every instance will 4 bytes larger for this field of the method table. This will consume many extra memory. I think an instance should only contains the type/struct id. Then for every type can be declared an optional method table. This can be in the type section or an extra section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I didn't mean to leave this permanently in "requesting changes" limbo! This MVP.md is sufficiently sprinkled with TODOs, and the PR's discussion sufficiently long that I think we should merge and file separate issues/PRs to sort out the individual TODOs. Two small requests before merging:
|
||
#### Integer references | ||
|
||
Tentatively, support a type of guaranteed unboxed scalars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a "TODO: this particular i31 design choice is tentative" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
proposals/gc/MVP.md
Outdated
@@ -0,0 +1,314 @@ | |||
# GC v1 Extensions | |||
|
|||
See [overview](Overview.md) for background. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a big bold "Note: this design is still in flux, even outside of TODOs below" or some such to the top here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@Horcrux7, hardwiring method tables as such into the language would essentially built-in an OO-specific object model, a bias that Wasm is meant to avoid. There are ideas for a more general notion of per-type fields that would hang off the internal type tag, but any mechanism like that would be post-MVP. |
This commit fixes #34 by specifying that the flags field (which indicates if a segment is passive) is a `varuint32` instead of a `uint8`. It was discovered in #34 that the memory index located at that position today is a `varuint32`, which can be validly encoded as `0x80 0x00` in addition to `0x00` (in addition to a number of other encodings). This means that if the first field were repurposed as a single byte of flags, it would break these existing modules that work today. It's not currently known how many modules in the wild actually take advantage of such an encoding, but it's probably better to be safe than sorry! Closes #34
* Add missing eq_ref * Fix export index computation * Remove bogus test
A more concrete suggestion for an MVP feature set. Still a number of todos, and instruction names diverged from Overview, should fix.
@lukewagner, WDYT?