Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to asm.js/webassembly, is it worth? #3248

Closed
innerground opened this issue Nov 29, 2017 · 68 comments
Closed

Port to asm.js/webassembly, is it worth? #3248

innerground opened this issue Nov 29, 2017 · 68 comments

Comments

@innerground
Copy link

innerground commented Nov 29, 2017

Hello,
I am currently doing some experiments with asm.js/webassembly and I have to say that the perfs potential is great!
As I love babylon.js (Thanks guys!), I was thinking about porting It to asm.js/webassembly (Unity is actually generating some low level code for asm.js for optimization).
I am working on a viewer to display and interact with quite big models (Via GLTF/GLB).
Anyway, my question is : Do you think that It is worth the work?I am pretty sure that some of you guys thought about the same, and I am curious to know what do you think.
Looking forward to reading your messages.
Regards,

@innerground innerground changed the title Port to asm.js/Webassembly, is it worth? Port to asm.js/webassembly, is it worth? Nov 29, 2017
@sebavan
Copy link
Member

sebavan commented Nov 29, 2017

The thing we thought about would be to only convert our hot path in wasm. Basically Matrix and Vector Maths as well as adding a Resources store to ensure they are all stored in the same buffer.

@innerground
Copy link
Author

innerground commented Nov 29, 2017

Yes, math should be the first thing to convert.Anyway, It is also possible to convert the whole library in C++ then converting It back to JS using emscripten, but I am not sure about the performance increase...

@innerground
Copy link
Author

The main problem I am facing is that the engine is very slow when you got thousands of nodes (And meshes)

@sebavan
Copy link
Member

sebavan commented Nov 29, 2017

The thing is with thousands of node and mesh wasm will still not be sufficient as after the issue would be around binding the info to webgl

@innerground
Copy link
Author

I already worked on some workarounds about that, If we can gain some perfs, that is good anyway.I am just wondering if I should give It a go.

@sebavan
Copy link
Member

sebavan commented Nov 29, 2017

I was planning to do it soon :-) but not Before Jan I guess.

@innerground
Copy link
Author

If you need help, no probs, I am experienced with C/C++/JS and others.

@sebavan
Copy link
Member

sebavan commented Nov 29, 2017

Give it a try if you wish I would be able to focus on other issues instead :-)

@innerground
Copy link
Author

@sebavan, As Babylon exists in TypeScript, I think that It is going to be easy to port It. There is a TS to Haxe converter, then Haxe to C++, do you think that we can give It a go?!

@sebavan
Copy link
Member

sebavan commented Nov 30, 2017

you can for test pupose this might help bootstrapping but if it proves working we ll need to carrefully craft a blazing fast math lib :-)

@innerground
Copy link
Author

I know I know, I will create a repo for that this week end.I will keep you updated.

@innerground
Copy link
Author

@sebavan 3MB of JS to convert by hand to C/C++ is a pain :D

@sebavan
Copy link
Member

sebavan commented Nov 30, 2017

Could you not only convert the Math part ??? (might be easier)

@vujadin
Copy link

vujadin commented Nov 30, 2017

https://github.com/samdauwe/BabylonCpp

@vujadin
Copy link

vujadin commented Nov 30, 2017

https://github.com/vujadin/BabylonHx

@nbouayad
Copy link

nbouayad commented Dec 1, 2017

@sebavan Which parts you want to be converted first?

@sebavanmicrosoft
Copy link
Contributor

I was thinking in Babylon.math.ts: the Matrix, Vector and quaternion classes. This is our cpu hit on large scene (deep and wide hierarchy of nodes).

Having a large native array fixed size and considering any of the Matrix/Vector... pointers to the array would prevent the expensive go and back from the wasm context and still allow all the contribution in TS/Javascript.

We would like to collect data around this part to evaluate the gain before going further.

Does it sound reasonable ?

@nbouayad
Copy link

nbouayad commented Dec 1, 2017 via email

@jbousquie
Copy link
Contributor

jbousquie commented Dec 1, 2017

May I let here my feedback about asm/wasm regarding BJS as I made some studies and experiments ?

First of all, is asm/wasm worth it ?
Always !

It's really really faster than JS, especially when you have to deal with huge amount of data.
You'll hardly see the difference between JS and asm/wasm on a single maths computation, but when getting to dozens of thousands per frame, it really makes the difference.

That said, we have to know how asm/wasm works in the browser.
Asm/wasm doesn't have a direct access to the DOM. This means some JS code is still required anyway to manage the UI events (user interactions) and the WebGL layer access. This part of JS code is mandatory.
This implies immediately that BJS can't be translated at once to C/C++ then to BJS.
Some parts of the code have to keep in JS to manage/orchestrate the i/o (user interaction, final rendering).

Other issue :
BJS is a JS framework.
This means that the final user will be able to code his game/scene logic in JS.
If the main part of BJS is translated to asm/wasm, then every method/object from the BJS API must be exposed/binded from asm/wasm, what is not a simple task. Simple when you expose dozens of asm/wasm functions only, quite complex when it comes to thousands.
Why ?
Because of the way the data are passed from JS to asm/wasm and back : there's no object or type compatibility between these two contextes. Asm/wasm doesn't know any JS types nor objects.
Everything must be passed in a single shared memory heap and the atomic shared element is ... the byte.
For instance, if JS must pass integers (indices), floats (positions/quaternions, etc), strings (names) or higher level objects, then everything has to be converted in bytes and to be put in the same memory heap to be exchanged. Yes, arrays of integers, floats, utf-8 encoded characters, etc in the same buffer ! I let you imagine how fun it can be to play with byte offsets to know where/what you are dealing with.
Note also that this buffer is statically allocated when starting your asm/wasm code. So what size ? 2 Mb ? 64 Mb ? no idea unless you know what will be managed in your scene : 1 mesh, 2000 meshes ? physics or not ?

Now, let's imagine we have achieved to implement such a generic way to run asm/wasm with the right heap size, all the methods exposed to JS, the JS orchestration part rightly decoupled. Ok ?
The user logic will still be written in JS, won't it ?
Yet, this is often the place where the bottleneck is ... not in the framework calls actually.

Imho, the best way to get real gain with asm/wasm would be that :

  • the whole framework, except the mandatory JS orchestration part, would be compilable to asm/wasm (I wish TS could be some day)
  • nothing of the framework is exposed to the JS main thread
  • the user logic is also compilable to asm/wasm and coded directly in the same context and language as the framework

So the user would code, say, in TS (when compilable to wasm) or Haxe or whatever in the same language than the framework.
Both (user logic + framework) would be compiled in the same final bunch : no painful communication between the user logic and the framework, only the pre-set exchange channels are then used between the compiled code and JS orchestration code (browser events + webgl rendering).

I guess that's the way Unity exports the code to wasm.

Another approach would be to implement then only some parts of the BJS current code, like some maths computations for instance.
Well, the gain is really poor (say for single quaternion computation or a matrix transformation applied to a vector3) from JS to wasm regarding a single computation.
What is then worth it would be to migrate the iterations (the loop) wasm side.
Say, if you have 1000 quaternions and 1000 rotation matrices to compute each frame, there's just a little gain to call 1000 times the wasm version of this computation (not so little, but not that important compared to the CPU bottleneck that is usually in the user logic). But there's a better gain to compute 1000 times both these computations inside the wasm code directly.
Then the complexity comes from the way to migrate all the loops/iterations wasm side, knowing that we still will have to iterate at least once JS to copy the all data in the memory heap to pass them to the wasm code.

Not that simple in every case ...

So there's ever this duality :

  • either the more integrated, the more complete and the more powerful, fast of the framework compiled in wasm, but the more complex to exchange data with the user logic part (unless also compiled, so no more need for exchanging data)
  • either only dedicated small pieces of the framework compiled in wasm (maths, culling) with basic data exchange mechanism already implemented, but far less global gain because other CPU bottlenecks will appear sooner

@jbousquie
Copy link
Contributor

Anyway
I think any attempt in this direction is worth it.

Simply because the JS JIT compiler can't now get far more faster, neither the CPU.
So the only ways to get faster in the browser are now wasm compilation and concurrency (workers).

@sebavanmicrosoft
Copy link
Contributor

Yup, no worries I will create an ugly scene full of cube all parented to a deep hierarchy with a cheap shader to really measure the isolate the cpu impact.

I am really curious on the gain. My first experiments on maths vector in wasm sharing the buffer and only indexing on it were actually not too bad.

@jbousquie
Copy link
Contributor

I did a asm test (not published for now) about a turboSPS.
40K solid particles moving and rotating... meaning a quaternion and a rotation matrix (-like) being computed for each one
The asm code was written by hand.
There's a substantial gain compared to the full JS version, but not as much as the same worker version: computation distributed among 4 workers in full JS, each dealing 10K particles only simultaneously.

What I would like to achieve is to migrate the 40K iterations in the asm code instead of calling it 40K times, because I did some very basic big loop tests and they are always fast faster in asm :
a 500K iteration loop, just assigning a simple float addition result to each of a 500K element array.
It's twice faster in asm than in full JS.

Note : I used asm instead of wasm because it can be easily written by hand (no need for C, then intermediate compilation) and performances in FF are for now quite comparable to wasm.

But Wasm keeps the way to go imho.

@jbousquie
Copy link
Contributor

jbousquie commented Dec 1, 2017

BTW the project AssemblyScript / Next looks promising, but I'm afraid that the lack of contributors prevents it to get mature :
https://github.com/AssemblyScript/assemblyscript
https://github.com/AssemblyScript/next

current discussion : AssemblyScript/assemblyscript#1

tl;dr, a language the closest possible to TS and a compiler emitting directly wasm bytecode.

@innerground
Copy link
Author

innerground commented Dec 2, 2017

Right, been playing around.The ts to hx option is not what we want at all, It is painful and not reliable.
Now, we got two options : a 1 to 1 translation of the actual code or, a smarter port of the code (Ie. Color3,Color4 and Vector2,Vector3 could be templated for example). That really depends on what we can consider as stable code vs evolving code.
Concerning the numeric precision, I was thinking that float is enough but maybe you can comment that out (number vs native types).
My proposal is therefore to differentiate what is "static/mature" and what can evolve, so we can have a good software life cycle plan.

@sebavan
Copy link
Member

sebavan commented Dec 2, 2017

The math tools is having a really low code churn in BJS and is pretty stable so up to you for the templating or not. Color is almost never use in Math for operations cpu.

float vs double is an interesting one :-) might be cool to at least alias the type and test both perf ?

@nbouayad
Copy link

nbouayad commented Dec 2, 2017 via email

@deltakosh deltakosh added this to the Future milestone Dec 4, 2017
@deltakosh
Copy link
Contributor

Can't agree more. We should define a list like collisions or solid particle system

@fmmoret
Copy link
Contributor

fmmoret commented Oct 14, 2018

https://hacks.mozilla.org/2018/10/calls-between-javascript-and-webassembly-are-finally-fast-🎉/

[X] spidermonkey

Just need chrome & edge to follow suit and we could swap out even kinda small parts

@deltakosh
Copy link
Contributor

This starts to LOOK REALLY GOOD :)

@jbousquie
Copy link
Contributor

yep... things are goinf the right way now :-D

@pkieltyka
Copy link

as well, its a matter of time before assemblyscript is mature enough to more easily compile babylonjs, the more contributors +/ sponsors on that project, the better!

@vtange
Copy link

vtange commented Oct 17, 2018

Long writeup coming through. Sorry if it seemed like I dropped off. I'm pretty busy with my own BJS project :)

@jbousquie I'm using assemblyscript itself to make the wasm. It's math only, and I'm focusing on the functions that I can figure out are called often/generate lotsa garbage.

So I dropped what I did last time and went with a WASM "scratchpad" approach. Basically the idea was to compile a list of methods in WASM that does all the math and call those methods from JS. WASM will take in a batch of arguments, do the math and store it in WASM-JS-shared memory. Since JS is basically singlethreaded it's technically possible to just have JS tell WASM to crunch all the numbers and then copy off WASM's "scratchpad" where it stored all the answers.

A lot of functions do relatively similar math. for example I have a function that just adds 3 pairs of numbers to each other. like this:

export function add3Pairs(r: f64, r2: f64, g: f64, g2: f64, b: f64, b2: f64): void {
    store<f64>(0,r + r2);
    store<f64>(8,g + g2);
    store<f64>(16,b + b2);
}

which can be used to add stuff for Color3s and Vector3s, like so:

    Vector3.prototype.addInPlace = function (otherVector) {
        this.x += otherVector.x;
        this.y += otherVector.y;
        this.z += otherVector.z;
        return this;
    };

becomes

    Vector3.prototype.addInPlace = function (otherVector) {
        exports.add3Pairs(this.x, otherVector.x , this.y, otherVector.y , this.z, otherVector.z);
        this.x += readWasmMemAsF64[0];
        this.y += readWasmMemAsF64[1];
        this.z += readWasmMemAsF64[2];
        return this;
    };

Now it's sorta too early to celebrate. I wrote some tests testing this idea by running add using JS and add using WASM 10000000 times and the results are unsurprisingly disappointing cause we're using WASM to do simple stuff like a + b.

color3.addWasmStyle = function()
{
  exports.add3Pairs(this.r,0.0001,this.g,0.0001,this.b,0.0001);
  this.r = readWasmMemAsF64[0];
  this.g = readWasmMemAsF64[1];
  this.b = readWasmMemAsF64[2];
  return this;
}
color3.addJS = function()
{
  this.r += 0.0001;
  this.g += 0.0001;
  this.b += 0.0001;
  return this;
}

console.time("js add");
for(let i=0; i<10000000; i++)
{
  color3.addJS();
}
console.timeEnd("js add");
//-----
console.time("wa add");
for(let i=0; i<10000000; i++)
{
  color3.addWasmStyle();
}
console.timeEnd("wa add");

This is on Chrome:
js add: 62.114013671875ms
wa add: 800.100830078125ms

I don't know what the guys at Mozilla fed Firefox, but it must be really good. We need more of it!
FF:
js add: 50ms
wa add: 247ms

And that's for SIMPLE a+b!
From Clark's writeup we can infer that everytime JS does something with numbers, it needs to wrap the answer in a "box" everytime. That means if you even start doing longer chains of math like:

export function superAdd(r: f64, r2: f64, g: f64, g2: f64, b: f64, b2: f64, a: f64, a2: f64): void {
    store<f64>(0,r + r2 + g + g2 + b + b2 + a + a2);
}

export function superMultiply(r: f64, r2: f64, g: f64, g2: f64, b: f64, b2: f64, a: f64, a2: f64): void {
    store<f64>(0,r * r2 * g * g2 * b * b2 * a * a2);
}

vs

color3.superAddJS = function()
{
  this.r = this.r + 0.0001 + 1.31140 + 0.0051 + 1.311210 + 0.0701 + 1.329 + 10.12144;
  return this;
}
color3.superMultiplyJS = function()
{
  this.r = this.r * 0.0001 * 1.31140 * 0.0051 * 1.311210 * 0.0701 * 1.329 * 10.12144;
  return this;
}

The numbers start getting closer:
Chrome:
js supermultiply: 208.1201171875ms
wa supermultiply: 806.9228515625ms
js superadd: 140.32470703125ms
wa superadd: 822.147216796875ms

Now Firefox is just showing off.
FF:
js supermultiply: 204ms
wa supermultiply: 265ms
js superadd: 133ms
wa superadd: 256ms

@vtange
Copy link

vtange commented Oct 17, 2018

I also wrote a function that should be able to scan all the methods of a given class in Math and test which ones are the slowest by running each of them 10000000 times. This is a rough sneak peak for Color3, only functions that don't return a new Color3() since I'm pretty sure BJS doesn't do that; it generates new objects once and then adds/mults/maths InPlace, right?

toString: 2111.466064453125ms
getClassName: 1081.635009765625ms
getHashCode: 1195.595947265625ms
toArray: 3359.76513671875ms
toLuminance: 1930.287841796875ms
multiplyToRef: 3183.98388671875ms
equals: 2025.455810546875ms
equalsFloats: 3914.947265625ms
scaleToRef: 8653.037109375ms
scaleAndAddToRef: 9073.156982421875ms
clampToRef: 5326.75830078125ms
addToRef: 3619.27490234375ms
subtractToRef: 3544.868896484375ms
copyFrom: 2292.550048828125ms
copyFromFloats: 4110.578857421875ms
set: 4111.8271484375ms
toHexString: 17046.4169921875ms
toLinearSpaceToRef: 6487.251953125ms
toGammaSpaceToRef: 7825.93798828125ms

I'll need to write a custom one for each class so it'll take some time to do properly.

I have a pretty major project using babylonjs so I'm pretty vested in it becoming as fast as possible. Once we get this and AmmoJS in then the real fun begins.

@vtange
Copy link

vtange commented Oct 17, 2018

@jbousquie it looks like you found a similar if not better way than how I was doing it. I was doing:

fetch("build/optimized.wasm")
.then(response => response.arrayBuffer())
.then(buffer => WebAssembly.instantiate(buffer, {
  env: {
    memory: new WebAssembly.Memory({ initial: 1 }),
    abort: function() { throw Error("abort called"); }
  }
}))
.then(module => {
  var exports = window.exports = module.instance.exports;
  var readWasmMemAsF64 = new Float64Array(module.instance.exports.memory.buffer);

and then just straight up reading values off readWasmMemAsF64 by index. Do you know how that compares to your way of using WebAssembly.instantiateStreaming?

@kutomer
Copy link

kutomer commented Oct 17, 2018

@vtange nothing to be sorry about, long but interesting, keep up the good work :)

@MaxGraey
Copy link

Interesting. I think you could get more benefits if switch from f64 types to f32 (single float) types. And do much more stuffs on wasm side for eliminate js <-> wasm interop.

@MaxGraey
Copy link

That's my experiment for porting mapbox's earcut triangulation algorithm to wasm and compare it with Rust version: mapbox/mapbox-gl-js#4835 (comment)
But that algorithm didn't use a lot of math stuffs.

@kutomer
Copy link

kutomer commented May 15, 2019

Hey guys, any updates?

@just1689
Copy link

Exciting stuff. This would solve some of the problems I've been having. Looking forward to seeing where the projects goes regarding wasm

@shaileshiyer
Copy link

shaileshiyer commented Jul 28, 2019 via email

@briantbutton
Copy link

briantbutton commented Sep 20, 2019

If I may add some comments about WASM. Here is what I have learned:

a) Unless things have changed, you don't want big WASM code modules. They need to get compiled when the page loads.

b) The interface between JavaScript and WASM has overhead. So, you want to do enough functionality in WASM to pay for it.

c) You wanna pick functions that are unlikely to change much

This means the best approach is a modular function with a lot of loops.

I always started by rigorously separating the functionality so that the WASM call could be a drop-in for the JS call. This allows a gain comparison and a handy retreat point.

My biggest winner was a function that did a lot of proximity calculations on a very large three-dimensional matrix, represented by a typed array. I got about 2.5x performance improvement.

That was well worth it but the complexity price was high.

Having done that, I had no interest in converting general purpose script to WASM.

@DerKarlos
Copy link

DerKarlos commented Sep 28, 2019

Sorry, I do have a different view to WASM, a more optimistic:

a) I would not call it "compiled". No source code is downloaded, not even something readable like ASM.js but binary optimized virtual assembler code. It gets "transcoded" to the actual CPU assembly. And now, this is done while the download is ongoing. So almost no time is wasted.

b) Yes , the interface is slow, yet, the next version will be remarkable faster. Still, you should not calc a square root by wasm, but do complex things.

One candidate I could image is the physic engine: It needs the start condition once, may be some updates and cyclically it will "move" the objects only. And the actual Typescript code is not needed to rewritten in C++ because there should be a C++ version already.

The next versions of WASM will have direct access to the DOM. So the whole Babylon could be WASM and only the Interface to Javascript. There will be garbage collection and a Typescript compiler too. So the Babylon code will be usable.

At last I would like to have a full WASM version of Babylon including the API. So I could use WASM-Babylon with my C++ or Rust code in the browser.

Is anyone working with WASM and Babylon? I may join, with my limited knowledge and time.

p.s.: The Form may be a better place to discuss this topic.

@StEvUgnIn
Copy link

Is there still any interest to port Babylon from TypeScript to AssemblyScript?

Three.js for instance is written in JavaScript, but glas (WebGL in AssemblyScript) is a port of Three.js to AssemblyScript. But the work is twice intensive, because the initial code could not work in TypeScript

@deltakosh
Copy link
Contributor

We still have a big interest but we are also looking for an automated solution

@StEvUgnIn
Copy link

It is possible to port the code to AssemblyScript and to keep the project working in typescript

@deltakosh
Copy link
Contributor

What i mean is that we want to keep the root source code in TS

@StEvUgnIn
Copy link

But AssemblyScript is a TS dialect

@deltakosh
Copy link
Contributor

Yes but a lot of people rely on using TS and all its ecosystem. It has to be TS->assemblyscript-> webasm

@DerKarlos
Copy link

DerKarlos commented Feb 25, 2021

I did not try it, but assemblyscript is an even stronger typescript
and it still can be used to generate javascript (next to WASM)

i64 instead of number

And there are some odds!
https://laptrinhx.com/assemblyscript-is-not-a-subset-of-typescript-1618271903/

@deltakosh
Copy link
Contributor

There is also the question of the support itself
TS is super well supported

@deltakosh
Copy link
Contributor

Closing for now. Will reopen if it becomes relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests