-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assemblyscript experiment #437
Conversation
Wait hold on I don’t think I’m not actually loading the module. |
Stupid rebase. @jakearchibald, PTAL :) |
@jakearchibald PTAL |
codecs/rotate/package.json
Outdated
{ | ||
"name": "rotate", | ||
"scripts": { | ||
"build": "mv rotate.{as,ts} && asc rotate.ts -b rotate.wasm --validate -O3 && mv rotate.{ts,as}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does asc depend on the extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it uses only .ts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If the file doesn’t end with .ts
it will append .ts
, but if I name the file .ts
webpack will for some reason try to compile the file as TypeScript, which will obviously fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can relocate AssemblyScript files into assembly
directory and add it to exclude
dirs set in root's tsconfig.json
I'll do some double checking tomorrow but this looks good |
d2Multiplier = 1; | ||
} | ||
|
||
for (let d2 = d2Start; d2 >= 0 && d2 < d2Limit; d2 += d2Advance) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could potentially speedup this more by making 4 different branches for different combinations of d1Advance
and d2Advance
. Like:
// d1Advance: 1, d2Advance: 1
if (d1Advance == 1 && d1Advance == 1) { // or rotate == 0
for (let d2 = d2Start; d2 < d2Limit; d2++) {
...
for (let d1 = d1Start; d1 < d1Limit; d1++) {
...
}
}
}
// d1Advance: -1, d2Advance: 1
else if (d1Advance == -1 && d2Advance == 1) { // or rotate == 270
for (let d2 = d2Start; d2 < d2Limit; d2++) {
...
for (let d1 = d1Start; d1 >= 0; d1--) {
...
}
}
}
// d1Advance: 1, d2Advance: -1
else if (d1Advance == 1 && d2Advance == -1) { // or rotate == 90
...
}
// // d1Advance: -1, d2Advance: -1
else { // rotate == 180
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we are already pretty fast, so I don’t see the need to sacrifice readability/elegance for speed.
Are there plans to integrate these kind of optimizations into asc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this kind of optimizations don't do by compiler. It's quite complicated even for LLVM I guess
Also you could improve readability and avoid low level class Pointer<T> {
constructor(offset: usize = 0) {
return changetype<Pointer<T>>(offset);
}
@inline @operator("[]")
get(index: i32): T { // overload operator for getter `ptr[index]`
const size = isReference<T>() ? offsetof<T>() : sizeof<T>();
return load<T>(changetype<usize>(this) + <usize>index * size);
}
@inline @operator("[]=")
set(index: i32, value: T): void { // overload operator for setter `ptr[index] =`
const size = isReference<T>() ? offsetof<T>() : sizeof<T>();
store<T>(changetype<usize>(this) + <usize>index * size, value);
}
} and now define some one pointer which reference to global memory with some offsets: let offset = inputWidth * inputHeight * bpp;
let input = new Pointer<u32>(0);
let output = new Pointer<u32>(offset);
...
for (let d2 = d2Start; /*...*/) {
let d2offset = /*...*/
for (let d1 = d1Start; /*...*/) {
let start = d1 * d1Multiplier + d2offset;
output[i] = input[start]; // now access to pointer entity as usual without load/store
i++;
}
}
|
@MaxGraey That Pointer class seems incredibly useful. Could it be bundled with an asc “standard library”? |
@MaxGraey I was actually also trying to create a |
If you want create instance for PS btw for |
I don’t plan on adding an allocator since we are not doing any allocations. The zero-cost |
@surma This abstractions already present but in experimental mode currently |
@jakearchibald this not always true. See this and this benchmarks. Of course in some situations Rust will be faster but not so much and usually in several times larger in binary size. |
@MaxGraey Sorry, we didn’t mean “Rust is faster than ASC in general”, but that the Rust version of this particular program ends up being faster than the as version (~500ms vs ~300ms on a 4k by 4k image). |
@surma Hmm. Could you share Rust version? Pretty interesting how LVM optimize this loops. This your AS version in WAS: https://webassembly.studio/?f=2mswis3rhev with measurements output in console. WAS also support Rust's template: https://webassembly.studio |
@MaxGraey Absolutely! Here’s the JS, ASC, C and Rust version that I have been comparing. Most recent results (when running with
|
@surma |
Sorry, I just forgot to commit that bit. Yeah, it barely makes a difference here :) |
Closing this as I will re-open a new PR for Rust. But if anyone find anything new, please feel free to comment! |
@surma Thanks for bench! My results: Chrome 72.0.3626.71 beta:JavaScript
AssemblyScript
Rust
Firefox 65.0:JavaScript
AssemblyScript
Rust
Safari 12.0.2JavaScript
AssemblyScript
Rust
|
You are not running the page with The benchmarks with 10000 iterations is not realistic as in our scenario the code will run once or twice, not 10000 times, so not much chance of any warm ups. |
With Chrome 72 beta:js avg: 176.27499999071006 Firefox 65.0:js avg: 267 Safari 12.0.2:js avg: 730 Hmm, pretty strangle. I think warmup need anyway. Because all engines using tier compilation. For example Chrome using |
Also I use slightly modified version of AS in that time: |
I used |
Yes, a warmup phase will increase performance, but is just not realistic for our use-case. The code will run once, maybe twice, that’s it. |
Ok, in this case I can't see huge difference between AS and Rust. How you got |
@surma I also run on MBP 15 Late 2013 on Mac OS 10.14.2. But I tested on Chrome 72 instead Chrome 71 and Firefox 65 instead 64. May be this matter. Btw I refactored with this version I got following results for Chrome 72 beta:js avg: 224.66999999596737 Firefox 65.0:js avg: 272 Safari 12.0.2:js avg: 716 |
I think need more tests on different machines |
No description provided.