Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assemblyscript experiment #437

Closed
wants to merge 6 commits into from
Closed

Assemblyscript experiment #437

wants to merge 6 commits into from

Conversation

surma
Copy link
Collaborator

@surma surma commented Jan 24, 2019

No description provided.

@surma surma requested a review from jakearchibald January 24, 2019 20:18
@surma
Copy link
Collaborator Author

surma commented Jan 24, 2019

Wait hold on I don’t think I’m not actually loading the module.

@surma
Copy link
Collaborator Author

surma commented Jan 24, 2019

Stupid rebase.

@jakearchibald, PTAL :)

codecs/rotate/rotate.as Outdated Show resolved Hide resolved
@surma
Copy link
Collaborator Author

surma commented Jan 24, 2019

@jakearchibald PTAL

{
"name": "rotate",
"scripts": {
"build": "mv rotate.{as,ts} && asc rotate.ts -b rotate.wasm --validate -O3 && mv rotate.{ts,as}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does asc depend on the extension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it uses only .ts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the file doesn’t end with .ts it will append .ts, but if I name the file .ts webpack will for some reason try to compile the file as TypeScript, which will obviously fail.

Copy link
Contributor

@MaxGraey MaxGraey Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can relocate AssemblyScript files into assembly directory and add it to exclude dirs set in root's tsconfig.json

@jakearchibald
Copy link
Collaborator

I'll do some double checking tomorrow but this looks good

d2Multiplier = 1;
}

for (let d2 = d2Start; d2 >= 0 && d2 < d2Limit; d2 += d2Advance) {
Copy link
Contributor

@MaxGraey MaxGraey Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could potentially speedup this more by making 4 different branches for different combinations of d1Advance and d2Advance. Like:

// d1Advance: 1, d2Advance: 1
if (d1Advance == 1 && d1Advance == 1) { // or rotate == 0
   for (let d2 = d2Start; d2 < d2Limit; d2++) {
      ...
      for (let d1 = d1Start; d1 < d1Limit; d1++) {
        ...
      }
   }
}
// d1Advance: -1, d2Advance: 1
else if (d1Advance == -1 && d2Advance == 1) { // or rotate == 270
  for (let d2 = d2Start; d2 < d2Limit; d2++) {
      ...
      for (let d1 = d1Start; d1 >= 0; d1--) {
        ...
      }
   }
}
// d1Advance: 1, d2Advance: -1
else if (d1Advance == 1 && d2Advance == -1) { // or rotate == 90
   ...
}
// // d1Advance: -1, d2Advance: -1
else { // rotate == 180
   ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we are already pretty fast, so I don’t see the need to sacrifice readability/elegance for speed.

Are there plans to integrate these kind of optimizations into asc?

Copy link
Contributor

@MaxGraey MaxGraey Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this kind of optimizations don't do by compiler. It's quite complicated even for LLVM I guess

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 25, 2019

Also you could improve readability and avoid low level load/store for raw memory pointers with this zero-cost abstraction class:

class Pointer<T> {
  constructor(offset: usize = 0) {
    return changetype<Pointer<T>>(offset);
  }

  @inline @operator("[]") 
  get(index: i32): T { // overload operator for getter `ptr[index]`
    const size = isReference<T>() ? offsetof<T>() : sizeof<T>();
    return load<T>(changetype<usize>(this) + <usize>index * size);
  }

  @inline @operator("[]=") 
  set(index: i32, value: T): void { // overload operator for setter `ptr[index] =`
    const size = isReference<T>() ? offsetof<T>() : sizeof<T>();
    store<T>(changetype<usize>(this) + <usize>index * size, value);
  }
}

and now define some one pointer which reference to global memory with some offsets:

let offset = inputWidth * inputHeight * bpp;

let input  = new Pointer<u32>(0);
let output = new Pointer<u32>(offset);
...
for (let d2 = d2Start; /*...*/) {
   let d2offset = /*...*/
   for (let d1 = d1Start; /*...*/) {
      let start = d1 * d1Multiplier + d2offset;
      output[i] = input[start];  // now access to pointer entity as usual without load/store
      i++;
   }
}

@jakearchibald
Copy link
Collaborator

@MaxGraey thanks for all your input on this, but I think we're going to go for Rust instead as the result is faster (although the wasm ends up larger).

@surma is writing up his experiences with this. We're really happy with where AssemblyScript is heading.

@surma
Copy link
Collaborator Author

surma commented Jan 25, 2019

@MaxGraey That Pointer class seems incredibly useful. Could it be bundled with an asc “standard library”?

@surma
Copy link
Collaborator Author

surma commented Jan 25, 2019

@MaxGraey I was actually also trying to create a ArrayBuffer instance with a given address and length, but couldn’t get it to work. Maybe that’s another possible solution?

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 25, 2019

@MaxGraey I was actually also trying to create a ArrayBuffer instance with a given address and length, but couldn’t get it to work. Maybe that’s another possible solution?

If you want create instance for ArrayBuffer or some typed array you should include allocator at first. AS support 3 types of allocators: arena, tlsf and buddy. If you haven't plan deallocations and just reset memory to initial state after use so in this case best solution use: import "allocator/arena". After that you can instantiate classes on heap like var arr = new UInt8Array(1); var dataView = new DataView(arr.buffer). But you use external raw memory and this most simple and performant solution for you case.

PS btw for Pointer<T> you don't need heap allocation because it return self instance pointer early in constructor and don't require allocator.

@surma
Copy link
Collaborator Author

surma commented Jan 25, 2019

I don’t plan on adding an allocator since we are not doing any allocations. The zero-cost Pointer abstraction seems incredibly useful for these kind of use-cases. Would love to see it shipped with asc.

@MaxGraey
Copy link
Contributor

@surma This abstractions already present but in experimental mode currently

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 25, 2019

@jakearchibald this not always true. See this and this benchmarks. Of course in some situations Rust will be faster but not so much and usually in several times larger in binary size.

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

@MaxGraey Sorry, we didn’t mean “Rust is faster than ASC in general”, but that the Rust version of this particular program ends up being faster than the as version (~500ms vs ~300ms on a 4k by 4k image).

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 26, 2019

@surma Hmm. Could you share Rust version? Pretty interesting how LVM optimize this loops. This your AS version in WAS: https://webassembly.studio/?f=2mswis3rhev with measurements output in console. WAS also support Rust's template: https://webassembly.studio

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

@MaxGraey Absolutely! Here’s the JS, ASC, C and Rust version that I have been comparing.

Most recent results (when running with ?single):

Language ms
JS 336
C 355
ASC 426
Rust 293

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 26, 2019

@surma
I see "asc rotate.ts -b assemblyscript.wasm --validate --optimize" instead "asc ... -O3". O3 usually 2x faster than optimize. However in this simple case this may be doesn't matter and results will be pretty the same

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

Sorry, I just forgot to commit that bit. Yeah, it barely makes a difference here :)

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

Closing this as I will re-open a new PR for Rust. But if anyone find anything new, please feel free to comment!

@surma surma closed this Jan 26, 2019
@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 26, 2019

@surma Thanks for bench! My results:

Chrome 72.0.3626.71 beta:

JavaScript

avg: 0.8487424999722861

AssemblyScript

avg: 1.1395414999424247

Rust

avg: 0.7038834999810206

Firefox 65.0:

JavaScript

avg: 1.1848

AssemblyScript

avg: 0.668

Rust

avg: 0.5601

Safari 12.0.2

JavaScript

avg: 0.9943999999999694

AssemblyScript

avg: 0.5790000000000001

Rust

avg: 0.5041000000000003

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

You are not running the page with ?single.

The benchmarks with 10000 iterations is not realistic as in our scenario the code will run once or twice, not 10000 times, so not much chance of any warm ups.

@MaxGraey
Copy link
Contributor

With ?single:

Chrome 72 beta:

js avg: 176.27499999071006
AS avg: 241.69500000425614
Rust avg: 203.32000000053085

Firefox 65.0:

js avg: 267
AS avg: 191
Rust avg: 199

Safari 12.0.2:

js avg: 730
AS avg: 234
Rust avg: 310

Hmm, pretty strangle. I think warmup need anyway. Because all engines using tier compilation. For example Chrome using liftoff in this case.

@MaxGraey
Copy link
Contributor

Also I use slightly modified version of AS in that time:
https://gist.github.com/MaxGraey/b9fd909d1da44aa4516c70fab1337712

@MaxGraey
Copy link
Contributor

I used rustc 1.33.0-nightly (ceb251214 2019-01-16) btw

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

I think warmup need anyway.

Yes, a warmup phase will increase performance, but is just not realistic for our use-case. The code will run once, maybe twice, that’s it.

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 26, 2019

Ok, in this case I can't see huge difference between AS and Rust. How you got 1.5x speed up on Rust compare to AS? May be I miss something? I see only 1.2x and only on Chrome. Hmm

@surma
Copy link
Collaborator Author

surma commented Jan 26, 2019

I am not sure. It’s probably because we are testing on different machines. I am only running the page on my MBP on Mac OS.

screen shot 2019-01-26 at 2 12 10 pm

@MaxGraey
Copy link
Contributor

MaxGraey commented Jan 26, 2019

@surma I also run on MBP 15 Late 2013 on Mac OS 10.14.2. But I tested on Chrome 72 instead Chrome 71 and Firefox 65 instead 64. May be this matter.

Btw I refactored rotate.ts as I suggested earlier. You can see it here:
https://gist.github.com/MaxGraey/ab8654e65b00d8427ea1121add94fbdd

with this version I got following results for ?single:

Chrome 72 beta:

js avg: 224.66999999596737
AS avg: 207.34000000811648
Rust avg: 203.6400000069989

Firefox 65.0:

js avg: 272
AS avg: 195
Rust avg: 204

Safari 12.0.2:

js avg: 716
AS avg: 193
Rust avg: 300

@MaxGraey
Copy link
Contributor

I think need more tests on different machines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants