-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbage Collection with Two Types #109
Comments
I'd say you definitely need to make the linear memory portion byte-addressed, as accessing It is not clearly explained why we need tags at this level of generality. Certainly, most Having the |
I think the tag is only for downcasting from |
Re Interior pointers. The fatness of the point is more than advertised:
|
@taralx Very interesting proposal!
I think these instructions are a great idea, but for optimizing compilers actually! (for streaming ones, the small overhead of bounds checks is probably not a big deal I suspect, given the other overhead) Specifically, they can help in two areas that have come up in discussions of other topics. First,
As this code stands, the VM must bounds check each line because if a trap occurs it must happen on the proper line. However, if we do
then the VM can check once up front, trap if it needs to there, and then not worry about it. Or maybe the compiler can see such a sequence and optimistically check up front that 3 is ok, and if not, only then iterate? But identifying such sequences is not trivial if there are ifs along the way,
This makes it even harder to guess. Instead, an explicit assertion solves the issue. |
Agreed that we need almost certainly need something more than "array of i32". Possibly a full parallel set to (or variant encoding of) the existing linear memory load/store operations.
As mentioned by others,
If this were going to be the final form of GC, we would absolutely do that. But this is an MVP, and adding bounds to the type risks severely complicating the type checking. So in the spirit of minimality, I recommend keeping them out. As you noted, they can be relatively easily added in later. |
I'm not clear on where this would be necessary for typed interior pointers? A fat pointer to a reference type would use an index into the reference array instead of an index into the scalar array.
Which language are you thinking of here? Go doesn't allow offsetting of interior pointers (slices are a different matter, and they are already fat in the way you describe). C# does, but only in unsafe code where it is expected that the code can break the bounds of the interior pointer. |
@kripken Agreed. We mentioned streaming compilers because we expect that optimizing compilers will often find that the region between accesses is sufficiently side-effect free and that the bounds checks can therefore be moved up. Streaming compilers would be much less likely to be able to do this. |
@taralx The accesses themselves have the potential side effect of trapping, though? For that reason I don't think an optimizing compiler can optimize things like |
@kripken Yes, you have the motivation for @aardappel @kmiller68 The proposal does not prescribe an implementation of tags. That said, what we had in mind was that the canonical tags would typically be embedded directly in the pointer, whereas non-canonical tags would be in the heap. Something I'm realizing isn't clear in the write up is that a tag can in general be any event (as in exception events from the the exception-handling proposal), so applications can specify their own tags in addition to the canonical ones. We use custom tags in the OCaml example and in the plan for tieing in with abstract imported types (without requiring those imported types to be reference types). (As a meta note, the name "event" may not be the best. It's tied to a specific application, whereas the application is more general. Something like "data tag", complementary to "call tag", might be a better fit. But that issue is clearly orthogonal to this proposal besides the confusing terminology "event" creates here.) |
This is effectively a dynamically-typed object system. I do not see how this can be made performance-competitive with, e.g. a typical implementation of Java's object model in any modern JVM. Reading a (tagged) reference field will yield an untyped tagged reference that needs both a tag check (presumably to crash if not an object) plus further bounds checking. That is prohibitive. The motivation for a dynamically-typed object system also seems unclear. WebAssembly engines for the web are already embedded in JavaScript virtual machines that will do an even better job (through complex dynamic optimization) than what is proposed here, and the model is more flexible. What value add is there of a dynamically-typed object system in that case? |
@titzer You seem to be under the impression that the current MVP is not dynamically typed. That's because on the surface it appears to be, but once you consider use cases in more depth you run into problems that effectively force you to use dynamic typing. That is why I have been asking for detailed case studies showing how the MVP would implement complete sets of interacting features. For example, because the current MVP has no plan to address the space-abstraction problem I presented at the in-person meeting, in #102 @skuzmich has explained that he will be representing Kotlin objects as arrays of anyrefs, where the first element is a reference to a vtable (that itself is an array of anyrefs), and the subsequent elements are references to structures of each class's non-inherited fields. So a field access involves an bounds-checked load followed by a dynamic cast (which itself involves a bounds-checked load) and then typed load. A method access involves a bounds-checked load, and then a dynamic cast (to get the vtable), and then a bounds-checked load followed by a dynamic cast (to get the declaring class's method table), and then a typed load (and after all that, the callee will still only know that the "this" pointer is an array of anyrefs). And all that indirection still doesn't address changes to which superclass declares a field or method. On the other hand, in this proposal a field access satisfying the needs in #102 is just a bounds-checked load possibly followed by a canonical tag check, which at worst is a load and at best is a bit-flag check in the pointer. Similar problems happen for pretty much every other language for some reason or another. In C#, an Note that all these cases use frequent dynamic casts, and in the current MVP those dynamic casts are implemented using a load of an array and then a bounds-checked load followed by an equality check, i.e. a bounds-checked double indirection. Among casting mechanism, this is one of the slower ones, especially considering how frequently it will have to be used. And we can try to rely on (inline) caching techniques to make up for these shortcomings, but my understanding is that WebAssembly is not supposed to rely on speculative optimization for its performance. To summarize, the current MVP (and likely any MVP) is not expressive enough to avoid extremely frequent casting. Our proposal embraces that inevitability and provides flexible representation with efficient casts. The current MVP, on the other hand, fails to recognize the limitations of its type system, forcing people to jump through hoops to fit their needs within it, and is designed around a slow casting mechanism that makes these limitations all the more painful. |
Also, while the simplicity of the proposal here leaves many options open for extending it with type information to eliminate various (efficient) casts, the complexity of the current MVP makes it unclear that it is possible to do the same to eliminate various (inefficient) casts. For example, there was a plan for extending the current MVP with existential types to eliminate some superfluous casts. But that was based on 20-year-old research that had been abandoned by the relevant research community when no one was able to figure out how to make it scale past toy languages and systems. It was unable to express common features (like typed arrays), too hard to generate (a PhD per compiler), and too bulky (type-annotation size was as large as code size). In fact, the entire long line of research on using structural types in low-level systems to describe the heap (as the current MVP does) was abandoned because its advocates were never able to overcome these challenges. To summarize, whereas the proposal here is easy to extend, there is good reason to believe the current MVP is not extensible in a significantly effective manner. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There is an undercurrent to the discussion that the current MVP is inadequate to describe object models akin to Java with the same level of efficiency as modern JVMs. We need to clarify the bounds of that inadequacy in order to point us in the proper direction. In particular,
@RossTate what you seem to be saying is that the dynamic check for virtual dispatch here is a kind of dynamic typing, and I agree. But I don't agree that by making the object system even more dynamic, we are solving any real problems. In fact, this is going the wrong direction, because it is even less efficient at its best, including the simple case of just having unboxed primitive fields in structures and structures of fixed size. It is performance prohibitive because it will be significantly slower and more heavyweight than a typical JVM's object model. I raised an important point earlier but maybe it was lost in the noise. There is already an efficient dynamically-typed object model as a target for web compilation: JavaScript. Without a value-add over that object system in any way, a dynamic object system in Wasm is just adding more engine complexity without accomplishing the goal. And what is that goal? Reaching competitive performance with native language implementations like the JVM. In particular, I believe that if the GC MVP cannot deliver a low-overhead object model competitive with the JVM, then it's a no-go. This proposal in particular would be less efficient than what a JavaScript engine would come up with after speculative optimization, which is itself less efficient than what a JVM would do. So instead of getting better than both of them, it would get worse than both of them. That's moving in the wrong direction. We need to be clear-eyed about where all this new trouble is coming from. The third point above identifies separate compilation as a source of new requirements. These problems are not inherent in the current MVP. The problems with inheritance/extensibility across the module boundary are an outcome of a new requirement that lowered code can interoperate across a module boundary. What is "lowered" code? What I mean by that is that language operations such as Instead, we should be pursuing late binding. I will soon propose a way to encode languages into Wasm that doesn't require lowering too early. Then we need to make late lowering work efficiently through a proper API. The good news is that we need almost no changes to any of the proposals in the pipeline, except to enhance the type imports proposal and/or merge it with the module-linking proposal. In fact, it's completely orthogonal to any decisions we make here; you don't even need any Wasm GC proposal[1] (!?) to accomplish this: only reference types and type imports. The high order bit of that approach is that if you want separate compilation and interoperability across a module boundary, then you need late binding with a runtime linker[2] that coordinates your language model and presents lowered code to the engine. The runtime linker is actually not too scary, and I am building some to validate this idea. If you don't need separate compilation, you can do fully offline (static) translation and directly present lowered code to the engine. This is the scenario for which the Wasm GC proposal was designed, but it's become increasingly clear to me that even though late binding is totally orthogonal, we cannot ship Wasm GC without having a story for late binding that retains current engines' compilation/caching model. [1] Of course, you do need to eventually hit bottom for lowering. The Wasm GC proposal, whatever form it takes, will eventually be the final language into which higher-level languages are lowered. |
I gave four different problems that four different languages on four different runtimes have with the current MVP that cause them to be better addressed by the simple proposal here. While I understand (and was aware of) your point about dynamic linking, that addresses only one of these problems. And my impression is that people do not want this proposal to become dependent on yet another proposal, or at least not one that sounds like quite a substantial undertaking, especially since there's growing concern that people will just start targeting linear-memory garbage collection if we can't get host-managed garbage collection out in reasonable time. So if our proposal makes WebAssembly a better target for these languages now, and you're concerned about competing with JavaScript now, then that's good reason to prefer it over the current MVP. And if you want WebAssembly to eventually be a better target than the JVM—where there are many challenges we will need to overcome to make that happen (such as dynamic linking), many of which we have barely begun to investigate in any real depth—then the simplicity of the current MVP makes it much more likely than the current MVP to be able to adapt to the surprises that will inevitably arise during that endeavor. (A concerning example I happen to know of off the top of my head is that no structural-based low-level type system (that I am aware of) has been able to guarantee that a JVM |
I'm not convinced that the MVP has to be better than compiling to JS. For example, WASI doesn't have a JS backend to support it, so you can't use that there, but you could use this there. There are complexities and inefficiencies to using JS objects from wasm. I also would be surprised if the techniques used by JS engines to infer static typing despite dynamic semantics couldn't also be applied to this MVP. After all, most code will probably put the same underlying typed object into the same place in a |
Maybe I'm misunderstanding the intention here, but you can compile WASI programs to JS using wasm2js + one of the JS implementations of the WASI APIs. And there is Cheerp which compiles to actual JS objects. |
I agree with @titzer that in the most general case of dynamic linking (where modules where compiled completely independently) no amount of Wasm features will be able to guarantee no semantics of any language can be violated. The discussion would be more productive if we would ignore these concerns entirely. A language basically has the choice to a) use very conservative representations at the boundaries (which I presume @titzer's late binding refers to), or b) use some form of language aware "linker" that can guarantee language semantics even with separate compilation (turning things into a closed world model at the latest possible moment). But I'd agree with @RossTate that any remaining case of "this common operation in common language X always requires an expensive dynamic cast, and we have no known mechanism to optimize these away locally" should be treated extremely seriously, and would be good to compare against alternative proposals. And one of these competing systems remains "linear memory", because if a language can elide dynamic casts and range checks because it knows more about its types than a Wasm GC can know, then that is a serious performance advantage. |
An interior pointer to a sub-record needs to know where the references and the scalars of the sub-record are located. |
I does not understand the concept of tagref. Is a sample possible? Ideally with a Java like source language. |
@Horcrux7 If you refer to the "Java/C#/Kotlin/Scala Object Layout" section above, all of the structrefs and funcrefs are actually packed into tagrefs before being stored in the object. This keeps the dynamic casts explicit. |
The section "Java/C#/Kotlin/Scala Object Layout" described like my compiler for Java work. Every object (also arrays) include a class description and a hashcode. But this not required special support from the WASM runtime.
If I understand you than you want make the language type hierarchy available to the WASM runtime in the |
Without support from the runtime, garbage collection is much more expensive to do. But any GC needs to trace references.
Not necessarily. It depends on the needs of the compiler. tagref is necessary in order to be able to put multiple types of reference into the structref, but anything beyond that is up to you. |
No, If the language chooses to reify its type hierarchy, it would generally be expected to do so by encoding it in the same manner it would do in its own runtime. For example, an object (encoded as a |
The proposal has now moved to phase 2 and consensus has coalesced around a different design, so I'll close this issue. |
Very fair. I am considering if I want to fork this off into a "dynamic memory" proposal, with the purpose of allowing JS typed arrays to cross the JS-wasm boundary without copying. |
@dtig's been looking at how to reduce copies across the boundary and iterating on a pre-proposal over at https://github.com/dtig/memory-control/blob/master/proposals/memory-control/Overview.md. I would chat with her if you're interested in that space. |
Will do, thanks! |
[test] Unify the trap message of "null function reference".
This is a continuation of #94 with a more fleshed-out proposal. Many thanks to @RossTate for helping me understand the underlying issues and honestly doing most of the drafting. 😄
Garbage Collection with Two Types
This is a description of an MVP for GC that would require just two types:
structref
andtagref
.This is not meant to be the end-all-be-all for GC; it is specifically designed to be an MVP.
In particular, it is kept very simple both to make it easy to implement and to make it easy to translate into a more advanced version-2 GC proposal.
Structure References
A
structref
is a reference to a mutable data structure that is simply an array of reference data and an array of scalar (i.e. linear-memory) data.The expectation is that a
structref
is implemented as a butterfly, i.e. a pointer to the "center" of a data structure, with one of the two arrays stored at negative offsets from that center and the other stored at positive offsets.How large these arrays can be and how scalar data is specifically represented are details that are worth discussion at some point, but which we believe should be deferred at the moment.
As a starting point, we suggest scalar data be represented as an array of
i32
fields, and that both reference and scalar arrays each have a maximum size of 230 (i.e. maximum index of 230-1).(Alternatively, the scalar data could be accessed with instructions akin to linear memory; we use
i32
"fields" here because it makes the examples simpler.)In order to avoid every access involving an array-bounds check, we include the following instructions:
structref.assert_bounds : [structref i32 i32] -> []
structref
has strictly fewer than the first number of reference fields or strictly fewer than the second number of scalar fieldsstructref.br_unless_bounds $label : [structref i32 i32] -> []
label $label : []
iff the givenstructref
has strictly fewer than the first number of reference fields or strictly fewer than the second number of scalar fieldsThe expectation is that, after these instructions, even a streaming compiler can eliminate the bounds checks for accesses to appropriate fields.
However, because this proposal is specifically an MVP, and because in the worst case superfluous bounds checks can be performed relatively quickly, we do not propose incorporating these bounds directly into type information.
That is,
structref
is a type with no parameters.The remaining
structref
instructions are the following:structref.new : [i32 i32] -> [structref]
structref
with the first number of reference fields and the second number of scalar fieldsstructref.get_bounds : [structref] -> [i32 i32]
structref.get_i32 : [structref i32] -> [i32]
structref.set_i32 : [i32 structref i32] -> []
structref.get_tagref : [structref i32] -> [tagref]
structref.set_tagref : [tagref structref i32] -> []
structref.eq : [structref structref] -> [i32]
Note that the last instruction means that structure references have a notion of identity, which makes sense given that they are fundamentally mutable data structures.
Tagged References
Note that the reference fields of a
structref
have typetagref
.A
tagref
is a tagged reference, meaning it has some tag and a payload corresponding to that tag.For this, we repurpose events from the exception-handling proposal.
The
tagref
instructions are the following:tagref.pack $tag : [t*] -> [tagref]
event $tag : [t*]
tagref.unpack $tag : [tagref] -> [t*]
tagref
was not packed with$tag
tagref.test $tag : [tagref] -> [i32]
tagref.br_unless_unpack $tag $label : [tagref] -> [t*]
event $tag : [t*]
Note that there is no way to mutate the contents of a tagged reference or to compare them for equality.
This enables a number of optimizations, and in particular means that we can use canonical tags instead of coercion instructions to and from each other primitive type.
Canonical Tags
We propose that some types should have canonical tags.
Although it is worth discussing which types specifically should have a canonical tag, we believe this discussion should be deferred, and so as a starting point we propose the following types:
structref
andfuncref
.We elide a canonical tag for
i32
for now to avoid discussions such as fori31ref
.The reason to have canonical tags is to enable specialized representations for canonical cases.
This avoids the need to always box and unbox particularly common reference values in order to put them into the reference fields of structure references.
It is even possible put canonical-tag information directly into the pointer so that no memory accesses are necessary to unpack these canonical cases.
Note that the call tags proposal (briefly known as dispatch tags) makes casting
funcref
to a more precise type unnecessary.Imports and Exports
Although not strictly necessary, this proposal is positioned to benefit significantly from compile-time imports and exports.
For example, a module for some Java class could request a compile-time import for the number of reference and scalar fields its superclass defined in some other module has, and then use that to compute the offsets at which its own fields should be located.
By having this import at compile-time, these offsets would be constant expressions and easily folded, both for
assert_bounds
instructions and for field-access instructions, thereby eliminating many bounds checks that would be difficult for any simple static type system to eliminate.This proposal also has no need to restrict type imports and exports to just reference types (and no need for subtyping in general).
One can simply import/export an arbitrary type along with a preferred tag to use for coercing it to and from
tagref
.If a module wants its exported type to be unexaminable, then the exported tag can be one created by the module and only exported with an abstract signature, or it can choose not to export a tag at all.
If a module does not care about abstraction, then the exported tag can be the canonical one for the exported type.
Either way, this proposal guarantees type abstraction by default.
Examples
Java/C#/Kotlin/Scala Object Layout
An object is represented as a
structref
.Its first scalar field is the numerical identity of the object (unless we choose to add numerical identity directly to
structref
in general).Its first reference field is the runtime info for the object's class (more on that below).
The remaining fields is the fields of the class.
(For C#, an object's runtime instantiation's of a class's type parameters is included in these fields.)
Due to the butterfly representation, subclasses can add more reference fields and more scalar fields without issue.
If the object is an array, then the array elements can simply be stored directly within the object.
All reference field values is tagged with the canonical tag for
structref
.A class runtime is also represented as a
structref
with various "sections".Some sections can be statically guaranteed to be at a particular offset, whereas some sections have a scalar field (at a statically known offset) that indicates where the section begins in the
structref
:(19 funcrefs using
func_switch
)(relevant paper for background)
(C#) index of type-argument list
(C#) type-argument list (structrefs)
Note also that this representation illustrates how this proposal supports compacting multiple variable-sized components into one memory structure, as is often done in runtimes.
In #102, @skuzmich raises the issue of supporting separate compilation to WebAssembly.
For abstracting the size of superclasses and the location(s) of fields/methods, a module could import and export the sizes of various non-private classes as well as the offsets of various non-private fields, ideally at compile-time.
For concealing information from other modules of the same Java/C#/Kotlin/Scala program, one could use custom tags for private fields.
For concealing information from modules outside of the Java/C#/Kotlin/Scala program, one could use a custom tag (shared across all the modules within the program) for sealing objects in order to protect data from untrusted code.
At the same time, one could specify prototypes in JavaScript that make these sealed values look like JavaScript values with the relevant methods.
OCaml
In #100, there was discussion of how to implement OCaml in the various GC proposals, and in this comment @sabine laid out roughly the following strategy using (the precursor to) this proposal (where
structref m n
indicatesm
reference fields andn
scalari32
fields):There are a few changes to @sabine's design in this example:
tuple<n>
andrecord<n>
blocks, leaving the tagging information in the scalar field to distinguish the two when necessaryi32
fields rather than bytes (a temporary detail)A key property of this representation is that it supports efficient polymorphic structural operations.
Checking whether a value is a
structref
is quick, as is checking whether thatstructref
has at least one scalar field.That scalar field then provides a tagging information (at the OCaml level, rather than the WebAssembly level), which informs the structural comparison how to proceed.
Typically this will involve comparing the scalar fields or recursively comparing the reference fields, which this proposal makes easy to iterate through.
Interior Pointers, Nested Structures, and Array Slices
In #59, @aykevl discusses Go's need for interior pointers, and in #77, @vargaz discusses .NET's need for interior pointers.
Although this proposal does not support true interior pointers (leaving that for some future extension), it does support fat interior pointers, which in #59 @aykevl says would be sufficient for now.
For example, an
int32*
in Go could be represented by a pair[structref, i32]
indicating astructref
and a specific offset within the scalar fields for the referencedint32
.Note that this even supports pointers to just past the end of array, which @vargaz says is important in #77.
Both Go and C# provide nested structures.
This proposal supports nested structures by splitting their reference and scalar fields across the two wings of the
structref
butterfly.In turn, an interior pointer to a nested structure comprosed of both reference and scalar data in Go/C# would be represented by a triple
[structref, i32, i32]
, with onei32
indicating the start of the reference components of the structure within thestructref
, and the other indicating the start of the scalar components.Similarly, Go has array slices, which can be represented by a
structref
paired with multiplei32
indicating the start and end/length of the array slice's reference and/or scalar components.Looking Ahead
Due to its simplicity, this proposal is easy to embed in (slight extensions to) either pre-existing proposal.
It does not commit us to either path.
We hope that it will provide a simple and flexible target for many languages, and after building up a community and corpus we can do analyses to inform descriptive type systems capturing the usage patterns in the wild.
A subsequent version of Garbage Collection can be developed to optimize for these patterns, providing more compactness, better performance, and improved composability and abstraction for real systems.
The text was updated successfully, but these errors were encountered: