Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: add prototype poly inline cache #244

Closed
ErosZy opened this issue Jan 4, 2024 · 34 comments
Closed

Optimization: add prototype poly inline cache #244

ErosZy opened this issue Jan 4, 2024 · 34 comments

Comments

@ErosZy
Copy link

ErosZy commented Jan 4, 2024

Hi all,

It's openwebf team here. We noticed quickjs-ng/quickjs merge Self Poly Inline Cache from openwebf/quickjs(#116). Previously, there was a memory leak in the Prototype Poly Inline Cache of openwebf/quickjs. However, we fixed this issue yesterday. Currently, both test262 and openwebf's unit and integration tests haven't encountered any related problems.

Based on the benchmark results, significant improvements have been observed in property access operations for heavy objects. The results are as follows:

  • Apple M1 Pro
  • macOS Monterey 12.2.1
  • Clang 13.0.0 arm64-apple-darwin21.3.0
openwebf/quickjs (feat/without-mimalloc-octane) quickjs-ng/quickjs (e995085) Improvement (%)
Richards 1324 1250 +5.92
DeltaBlue 1281 1139 +12.46
Crypto 1314 1310 +0.30
RayTrace 1648 1152 +43.05
EarleyBoyer 2225 2151 +3.44
RegExp 296 258 +14.72
Splay 2467 2169 +13.73
SplayLatency 9089 8197 +10.88
NavierStokes 2576 2407 +7.02
PdfJS 4208 4099 +2.65
Mandreel 1173 1087 +7.91
MandreelLatency 7485 7396 +1.20
Gameboy 9874 9380 +5.26
CodeLoad 14869 16578 -10.30
Box2D 5184 5063 +2.38
Typescript 17665 17736 -0.45
Score (version 9) 3091 2889 +6.99

But we are short-staffed (busy with work 😄), and I noticed there are quite many changes in quickjs-ng/quickjs. Submitting a pull request (PR) might take much time for us. If needed, feel free to merge this feature on your own, or wait a bit until we have time to submit the PR ourselves.

@ErosZy
Copy link
Author

ErosZy commented Jan 4, 2024

@bnoordhuis
Copy link
Contributor

Thanks for the heads up. Is openwebf/quickjs@079b090 the commit with the memory leak fix, or are there more?

@ErosZy
Copy link
Author

ErosZy commented Jan 4, 2024

@bnoordhuis oh, that only has its memory leak bug fix. maybe you should see feat/proto-ic branch.

@ErosZy
Copy link
Author

ErosZy commented Jan 4, 2024

@bnoordhuis or openwebf/webf pr: openwebf/webf#341, i recommend that.

@bnoordhuis
Copy link
Contributor

Thanks! I'll try to go over it sometime this week.

@saghul
Copy link
Contributor

saghul commented Jan 4, 2024

Thanks for the heads up!

@littledivy
Copy link
Contributor

I remember fixing a bunch of memory leaks (JSFunctionDef* was one) when porting the openwebf/quickjs implementation. I wonder if we already have the fixes.

@ErosZy
Copy link
Author

ErosZy commented Jan 4, 2024

@littledivy yes, inclued those bugs fixes. you can set DUMP_LEAK and run test262 to check.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@bnoordhuis @saghul @littledivy Sync up the progress:
We have merged the corresponding implementation into openwebf/webf, and based on our current online data, the first screen performance has seen significant improvements:

For a complex landing page:
DOM Nodes: 590+
DOM operations: 2300+
First screen load time: Improved from 400ms to 300ms, a 25% enhancement.

For a chat room page:
DOM Nodes: 1500+
DOM operations: 5000+
First screen load time: Improved from 1 second to 800ms, a 20% enhancement.

Additionally, I conducted a simple test comparing Hemres and the current QuickJS implementation, and the results are as follows:

webwxgetmsgimg

Looking forward to your merge and benchmark. 😄

@ammarahm-ed
Copy link
Contributor

@ErosZy Adding this would be awesome but I noticed a big part of perf in webf fork comes from using mimalloc. #142

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@ammarahm-ed Not conflicting, the replacement of mimalloc is necessary, and it does bring a significant improvement. However, the enhancement from the prototype poly inline cache builds on the foundation of both self poly inline cache and mimalloc replacement, providing an additional improvement of approximately 5-10%. 😄

For projects like webf, with a long inheritance chain in the DOM API, the addition of prototype poly inline cache, on top of self poly inline cache and mimalloc replacement, reduces the overall overhead of DOM API calls. This improvement contributes to the enhanced performance of the first screen.

@ammarahm-ed
Copy link
Contributor

@ErosZy That's super amazing! Btw do you guys plan to use this fork instead in future. I do think it would be great for the community if we put all the workforce in the same place and work on a single quickjs fork. Fabrice Bellard has been backporting patches from this fork too into the main one.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@ammarahm-ed We primarily make modifications in our own repository because our project has its own specific goals. However, we make an effort to synchronize these changes with this repository as much as possible.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@ammarahm-ed Thank you for your reminder, we build without mimalloc and benchmark again:

openwebf/quickjs (feat/without-mimalloc-octane) quickjs-ng/quickjs (e995085) Improvement (%)
Richards 1324 1250 +5.92
DeltaBlue 1281 1139 +12.46
Crypto 1314 1310 +0.30
RayTrace 1648 1152 +43.05
EarleyBoyer 2225 2151 +3.44
RegExp 296 258 +14.72
Splay 2467 2169 +13.73
SplayLatency 9089 8197 +10.88
NavierStokes 2576 2407 +7.02
PdfJS 4208 4099 +2.65
Mandreel 1173 1087 +7.91
MandreelLatency 7485 7396 +1.20
Gameboy 9874 9380 +5.26
CodeLoad 14869 16578 -10.30
Box2D 5184 5063 +2.38
Typescript 17665 17736 -0.45
Score (version 9) 3091 2889 +6.99

@ammarahm-ed
Copy link
Contributor

ammarahm-ed commented Jan 5, 2024

@ErosZy That's great, do you mind running the benchmark with Hermes too for comparison? I would be surprised if QuickJS is faster than Hermes without mimalloc.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@ammarahm-ed Yep, but i don't record the benchmark result, In my impression, the difference between the two is not significant(maybe Hermes faster 15-20%), even if Hermes has been optimized to the maximum. If you have time, you can give it a try. 😁 (However, maximum optimization may also sacrifice startup time, and this needs to be considered comprehensively.)

@bnoordhuis
Copy link
Contributor

I remember fixing a bunch of memory leaks (JSFunctionDef* was one) when porting the openwebf/quickjs implementation. I wonder if we already have the fixes.

I think so. Quick visual inspection of the code doesn't show obvious memory leaks (I remember pointing out one or two during the first review round), the (A|M|UB)San buildbots are happy, and so is -DDUMP_LEAKS=1.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

I remember fixing a bunch of memory leaks (JSFunctionDef* was one) when porting the openwebf/quickjs implementation. I wonder if we already have the fixes.

I think so. Quick visual inspection of the code doesn't show obvious memory leaks (I remember pointing out one or two during the first review round), the (A|M|UB)San buildbots are happy, and so is -DDUMP_LEAKS=1.

@bnoordhuis any reproducible cases?

@bnoordhuis
Copy link
Contributor

Not sure what you mean. The ones I commented on during review?

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@bnoordhuis I understand that you actually found some memory leaks that cannot be detected even after setting DUMP_LEAK=1, right? If there are such examples, I can see if they can be reproduced and resolved.

@bnoordhuis
Copy link
Contributor

Ah, I think you misunderstand me. There were some memory leaks in the pull request that @littledivy opened. I pointed them out and he fixed them before the PR was merged. That is to say, the quickjs-ng master branch never had those memory leaks.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@bnoordhuis 😄 I see

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@ErosZy That's great, do you mind running the benchmark with Hermes too for comparison? I would be surprised if QuickJS is faster than Hermes without mimalloc.

@ammarahm-ed From the results, it appears to be at the level of Hermes' -O0, with around a 35% difference compared to -O optimization.

And an interesting phenomenon is that for a 134kb JavaScript code file, QuickJS takes 15ms to parse the code to executable, while Hermes takes 30ms (-O0) and 60ms (-O) respectively.

However, this benchmark is relatively simple and can only provide some partial reference.

The test environment is a virtual machine on GitHub's Codespaces.

openwebf/quickjs ( 6f54e7e without mimalloc) Hermes with -O Hermes with -O0
Richards 824 1166(+41.5%) 718(-12.8%)
Crypto 722 1275(+76.5%) 773(+7.0%)
RayTrace 963 1262(+31.0%) 1035(+7.4%)
NavierStokes 1256 1428(+13.6%) 1038 (-21.0%)
DeltaBlue 827 1018 (+23.0%) 696( -18.8%)
Total Score 902 1222(+35.4%) 839(-7.5%)
File Size 1.2M 4.1M 4.1M

@ammarahm-ed
Copy link
Contributor

@ErosZy that's great for the amount of people working on QuickJS and such a small size. I am sure there's still room for more performance but still very happy. Mimalloc helps a lot though to get that extra perf.

@ErosZy
Copy link
Author

ErosZy commented Jan 5, 2024

@ammarahm-ed Yes! We already have improvement plans regarding the GC, but the implementation is expected to take quite some time. At the same time, there is still plenty of room for bytecode optimization in QuickJS. We can learn relevant backend optimizations from Hermes' LLVH.

@ammarahm-ed
Copy link
Contributor

ammarahm-ed commented Jan 5, 2024

@ErosZy I like the sound of that! But even now i am super happy with the performance. Btw @bnoordhuis was also working on quickjit, a fork with jit for quickjs by using tinycc. QuickJS is a true gem, lots of potential.

@buddyliao
Copy link

buddyliao commented Jan 11, 2024

@ErosZy ,Hello, I haven't see any progression on octane when running on x64 linux platform while using ic without mimalloc, do you have the same problem?

@ErosZy
Copy link
Author

ErosZy commented Jan 12, 2024

@buddyliao what do you mean? Are you saying that you didn't get any output while running the latest version of quickjs-ng/quickjs on x64 Linux system for the octane benchmark?

@ammarahm-ed
Copy link
Contributor

ammarahm-ed commented Feb 9, 2024

@bnoordhuis @saghul @littledivy Sync up the progress: We have merged the corresponding implementation into openwebf/webf, and based on our current online data, the first screen performance has seen significant improvements:

For a complex landing page: DOM Nodes: 590+ DOM operations: 2300+ First screen load time: Improved from 400ms to 300ms, a 25% enhancement.

For a chat room page: DOM Nodes: 1500+ DOM operations: 5000+ First screen load time: Improved from 1 second to 800ms, a 20% enhancement.

Additionally, I conducted a simple test comparing Hemres and the current QuickJS implementation, and the results are as follows:

webwxgetmsgimg

Looking forward to your merge and benchmark. 😄

Hey @ErosZy Can you provide the benchmark.js file you used to run these benchmarks by any chance?

Trying to run a similar benchmark on my machine and I am getting very different results.

@ErosZy
Copy link
Author

ErosZy commented Feb 20, 2024

@richarddd
Copy link
Contributor

@ErosZy I like the sound of that! But even now i am super happy with the performance. Btw @bnoordhuis was also working on quickjit, a fork with jit for quickjs by using tinycc. QuickJS is a true gem, lots of potential.

Super cool to see someone experimenting with JITing QJS. TinyCC only supports x86 and is not maintained anymore. I think https://asmjit.com/ would be a great alternative, have you looked into it @bnoordhuis?

@bnoordhuis
Copy link
Contributor

I'm familiar with it but emitting machine code isn't the hard part of an optimizing compiler, code and data analysis is.

TinyCC only supports x86 and is not maintained anymore

It is, just not by Fabrice anymore: https://repo.or.cz/w/tinycc.git

@richarddd
Copy link
Contributor

I'm familiar with it but emitting machine code isn't the hard part of an optimizing compiler, code and data analysis is.

TinyCC only supports x86 and is not maintained anymore

It is, just not by Fabrice anymore: https://repo.or.cz/w/tinycc.git

Forgive my ignorance. It even supports arm now https://github.com/TinyCC/tinycc Fantastic!

@saghul
Copy link
Contributor

saghul commented Jun 21, 2024

Closing since there is no longer anything actionable here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants