Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC story for arrays? #151

Closed
dcodeIO opened this issue Oct 10, 2020 · 22 comments
Closed

GC story for arrays? #151

dcodeIO opened this issue Oct 10, 2020 · 22 comments

Comments

@dcodeIO
Copy link

dcodeIO commented Oct 10, 2020

As briefly mentioned in #145, arrays have yet unaddressed overhead / interoperability problems:

  • A compiler (or its standard library) cannot efficiently pre-allocate or resize arrays, if the language's array type is resizable (like JS), for instance by pushing to it. Instead it has to grow the array by one element at a time where the element type is not defaultable, producing copying overhead and likely garbage collector pressure over something that is otherwise the ABC of compiler development.
  • Even if the element type is defaultable, resizing arrays in an efficient way implicitly makes the array non-interoperable between the module and either the host or another module because it contains placeholder elements over its actual length.

When remaining unaddressed, these limitations will effectively prevent AssemblyScript for example from adopting Wasm GC early, losing one potentially valuable source for informing the process. I hence suggest to evaluate solutions in the MVP.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 10, 2020

Potential solution: Keep arrays as is, and introduce a vector type taking size-hints upon creation and resize, trapping when OOB.

@aardappel
Copy link

aardappel commented Oct 10, 2020

Pretty much no memory managers underlying GCs or otherwise can cheaply grow buffers in most cases. Almost all languages that have a resizable array type (vector) implement it by reallocating and copying when the reserved size is exceeded. Luckily, most of those would choose to grow the backing buffer by 1.5x to 2x usually, not on every single new element as you assume.

If resizing is indeed such an expensive operation with a variety of choices to be made in terms of growing rate and reserving ahead of time, it is indeed the right choice for something like Wasm GC to leave these implementation choices to the language, rather than to do resizing itself.

A vector also implies indirection (since buffer reallocation must not be observable by any references to the vector), whereas a fixed array can be represented directly with a single buffer. Again, the choice of using indirection should be left to the language. Functional languages for example could implement resizing without an indirection since other references can't observe the new element anyway).

Java, for example, has no built-in resizable arrays, and each resizable container implements its own resizing logic on top of them.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 10, 2020

Luckily, most of those would choose to grow the backing buffer by 1.5x to 2x usually, not on every single new element

Yeah, that's exactly what's not possible currently if an arrays element type is non-defaultable, i.e. not nullable and doesn't have a trivial way to construct a default of that type.

not on every single new element as you assume.

Not assuming that btw.

Pretty much no memory managers underlying GCs or otherwise can cheaply grow buffers in most cases

And I've implemented several of these, no worries. Mine, based on TLSF, can cheaply grow if the adjacent right free block is large enough.

A vector also implies indirection (since buffer reallocation must not be observable by any references to the vector

Right, that's why AssemblyScript has both Array<T> and StaticArray<T>, with the latter being fixed as an option for a developer to opt-in to.

@tlively
Copy link
Member

tlively commented Oct 10, 2020

Luckily, most of those would choose to grow the backing buffer by 1.5x to 2x usually, not on every single new element

Yeah, that's exactly what's not possible currently if an arrays element type is non-defaultable, i.e. not nullable and doesn't have a trivial way to construct a default of that type.

My understanding is that this isn't a problem because producers would simply choose to use nullable reference types to implement vectors, even if the element types were non-nullable in the source language. I can't think of a reason a producer would need to use non-nullable references in its vector implementation, although perhaps there is some reason I haven't thought of.

That doesn't address the interoperability concern, though.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 10, 2020

Goes a little into the direction of Cast free arrays, since if the type is nullable but isn't in the source language, you'll end up with superfluous casts again that the other issue is trying to get rid of. cc @RossTate

@aardappel
Copy link

can cheaply grow if the adjacent right free block is large enough

Yes, welcome to 50 years of realloc implementations. This behavior is the exception rather than the rule however, and can absolutely not be relied upon. Thus, the idea that we should have a resizing array as part of Wasm is still a bad idea, hiding this possibly complex/inefficient behavior, hoping the engine will occasionally give you the efficient path. Again, the language, not Wasm, should be in control of this.

As for arrays needing to contain nulls.. there's a ton of possible data types and algorithms that typically build on top of arrays that may need to have empty elements, or delayed initialization etc. and they're all going to run into this issue, a vector is nothing special there. That's a general consequence of the simple type system we have. Safe + simple + efficient, pick 2.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 10, 2020

Yes, welcome to 50 years of realloc implementations.

Looks like you are an experienced engineer. How would you tackle the problems of superfluous casts / non-interoperability when doing nothing isn't an option for AS?

@aardappel
Copy link

How would you tackle the problems of superfluous casts / non-interoperability

The initial GC design can't solve all things for all languages and still be simple enough such that we can make forward progress. There have been plenty of ideas on how casts can be elided (see @RossTate's alternative GC proposal in particular), but adopting these all, and making sure no language is left with a bad use case, is going to delay things indefinitely. The current GC proposal is from about ~3 years ago, and we're still discussing fundamentals. Best we can hope for at this point is not make decisions that do irreversible damage.

As for interop, again, there is only one real solution, and that is Interface Types. Adding special purpose types for the benefit of one language's intended implementation is all incredibly premature.

when doing nothing isn't an option for AS?

The GC proposal is trying to serve a LOT of languages, all of which have use cases that don't map 1:1 onto the current proposal. I am not sure why you keep thinking AS should have special status within that group.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 12, 2020

The initial GC design can't solve all things for all languages

Well, AS isn't really that special tbh. It's much like Java and C# for instance. Furthermore, I guess it is the closest to JS with static types one can get, so solving issues it encounters helps on multiple fronts like for example interop with JS. As such

make decisions that do irreversible damage

I think that not talking about these things is already doing irreversible damage. I guess that's exactly what you do not want to happen?

As for interop, again, there is only one real solution, and that is Interface Types

There are certain related and unrelated issues with interface types that I am trying to solve, as you probably know. I agree of course that interface types may be able to help here. How would you design a solution for this particular problem based on interface types? Trim the array at the boundary (alloc+copy->garbage)?

I am not sure why you keep thinking AS should have special status within that group.

Again not thinking that btw. If you have grievances with me or AssemblyScript, I can offer to talk about the reasons for it in private? Or perhaps use the meeting later today?

@aardappel
Copy link

aardappel commented Oct 12, 2020

How would you design a solution for this particular problem based on interface types? Trim the array at the boundary (alloc+copy->garbage)?

Copy is certainly the default for IT. Not sure if trimming would be required.. dealing with null types looks like something code that wants to be very general may have to do.

Again not thinking that btw.

Then I don't think you see how strongly a statement like doing nothing isn't an option for AS comes across (and similar language in the string issue). I have nothing against AS, I just don't think it is more important than all the 101 other GC languages we're potentially serving. If you indeed agree, we can instead switch back to talk about how the GC proposal serves Java, C#.. etc, and assume AS will be well served by the outcome.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 12, 2020

It is indeed not an option for AS to have non-interoperable arrays with JS. Defeats the whole point of why AS exists / what people want to do with it. (Perhaps being direct like this is a cultural difference? Like, I typically state the facts instead of beating around the bush more than it helps. Some people value that. And to be direct once more, I don't think that you are in a position to cast the first stone here?)

@kripken
Copy link
Member

kripken commented Oct 12, 2020

To repeat what I said in an offline meeting earlier: I'd expect the pre-import idea to work here too, similar to strings.

Not exactly the same, in that I'd expect a pre-imported string to be all the strings a program needs, while a pre-imported array would be interesting for JS interop, but not as optimized as a wasm array - which is more like a JS typed array. So both types (pre-imported JS array, and wasm array) would be of use in that case.

The nice thing with the pre-import idea is that once we have working code with it, we can actually explore this space, and learn a lot. That can suggest further spec additions, that are not feasible to decide on atm.

@titzer
Copy link
Contributor

titzer commented Oct 12, 2020

It's worth pointing out that in many languages, particularly Java and C#, arrays are not growable. In order to protect themselves from their arrays' lengths being modified by outside code breaking language invariants, they'd have to wrap their internal arrays so they don't escape.

@titzer
Copy link
Contributor

titzer commented Oct 12, 2020

It's also worth pointing out that arrays in JavaScript are pretty complicated. One can delete elements from the middle array (where subsequent accesses poke through to the prototype chain), and they can be grown or shrunk from either end. Wasm is not going to adopt such capabilities for arrays by default, so you need to proxy arrays that escape to JavaScript if you want to interoperate properly.

@dcodeIO
Copy link
Author

dcodeIO commented Oct 12, 2020

Hmm, good points. So the use case I have in mind is Wasm code calling Web APIs that expect one or more (object) arrays as arguments (also vice-versa). With a proxy, this can work I guess as long as the API expects an array-like object, or would it fail more often than it'd work?

@titzer
Copy link
Contributor

titzer commented Oct 12, 2020

It's a difficult situation and I don't envy you :-)

Arrays in JavaScript are complicated enough that it might be easier to use JS arrays directly and "proxy" them to Wasm--i.e. import the getElem and setElem operations into Wasm modules that want to use these types of arrays. JavaScript array access semantics gets pretty bizarre in the corner cases and I am not sure how reliant Web APIs are, in general, on their oddities.

@bvibber
Copy link

bvibber commented Oct 13, 2020

@titzer wrote:

Arrays in JavaScript are complicated enough that it might be easier to use JS arrays directly and "proxy" them to Wasm--i.e. import the getElem and setElem operations into Wasm modules that want to use these types of arrays. JavaScript array access semantics gets pretty bizarre in the corner cases and I am not sure how reliant Web APIs are, in general, on their oddities.

With both strings and arrays, one notable problem is there is no importable host function which will cleanly perform the following operations:

  • read an element/char
  • write an element
  • push an element to the end and expanding the size
  • concatenate
  • slice

You would have to write JavaScript host functions wrapping non-function operations like the "+" operator for string concatenation, and while you could import JavaScript's Reflect.* functions to get/set array indices through properties or call prototype methods, I suspect it would not be very efficient. For instance Reflect.apply requires building an array of arguments at runtime for every call, and Reflect.get/set might expect strings instead of integer indices, so that might be a weird one to import consistently.

Have these things been thought out and explicitly written out with example code yet?

@tlively
Copy link
Member

tlively commented Nov 1, 2022

The design of arrays and the broad picture of JS interop are settled for the MVP, so I'll go ahead and close this issue. PRs adding ideas for future extensions to the post-MVP document would be welcome, though.

@tlively tlively closed this as completed Nov 1, 2022
@trusktr
Copy link

trusktr commented May 13, 2023

I get that we want to support many languages, but JavaScript is the primary language of the Web (even in the non-JavaScript JavaScript-GC'd DOM APIs written in C++), and we're talking about WebAssembly here.

Even if we eliminate JavaScript by allowing non-JS languages to be used via WebAssembly, practically all DOM APIs are implemented as JavaScript APIs with the JS engine's GC, and this includes any APIs that accept or return JavaScript strings and JavaScript arrays.

I doubt we are going to be changing all of these DOM APIs to not use JavaScript strings, arrays, and reliance on JavaScript GC. If we do, it will be a huge effort.

By not having an interoperable-with-web mentality up front, it seems we're preferring some languages (not JavaScript) in conflict with how the web works, and possibly making web interop more difficult.

It seems we can (and should) design WebAssembly with the web in mind first and foremost, while still allowing options that can work for all languages (f.e. a language that wants to handle array growth doesn't use that feature).

How do we best support the web, and languages designed for web that compile JavaScript/TypeScript to WebAssembly who wish to have high interoperability with all web and DOM features without inefficient boundaries?

Can these facilities for web interop be added later on top of what we now have and plan to have? Does the direction we go in make future web-interoperable interfaces in Wasm more difficult to add?

@trusktr

This comment was marked as resolved.

@trusktr
Copy link

trusktr commented May 13, 2023

Admittedly, many web developers don't know what they need at a lower level, but we must not abuse this fact for purposes that don't align with the existing web.

@WebAssembly WebAssembly locked as too heated and limited conversation to collaborators May 13, 2023
@conrad-watt
Copy link
Contributor

conrad-watt commented May 13, 2023

@trusktr your comment above is entirely inappropriate and in violation of the W3C CEPC.

Locking as this issue has already been closed and I see no way that further comments will be productive.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants