You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to apply Profile-Guided Optimization (PGO) to optimize llrt performance further (as I already did for many other projects - see all current results here). I performed some basic benchmarks and want to share the results here.
Test environment
Fedora 39
Linux kernel 6.7.3
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.76
llrt version: the latest for now from the main branch on commit c040bfd05a2be8d3300e7a1bbfc9405c42a865fa
Disabled Turbo boost (for more stable results across benchmark runs)
Benchmark
As a benchmark, I use the same command as I found in the Makefile - llrt fixtures/hello.js. The same scenario is used for the PGO training phase. All PGO optimization steps are done with cargo-pgo tool. PGO instrumented version is built with cargo pgo build, PGO optimized version - cargo pgo optimize build. taskset -c 0 is used for reducing CPU scheduling influence on the results.
Results
I got the following results:
hyperfine -u microsecond -N --warmup=2000 --min-runs 10000 "taskset -c 0 ./llrt_optimized ../fixtures/hello.js" "taskset -c 0 ./llrt_release ../fixtures/hello.js"
Benchmark 1: taskset -c 0 ./llrt_optimized ../fixtures/hello.js
Time (mean ± σ): 2664.8 µs ± 78.8 µs [User: 590.1 µs, System: 1943.3 µs]
Range (min … max): 2478.1 µs … 4486.1 µs 10000 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: taskset -c 0 ./llrt_release ../fixtures/hello.js
Time (mean ± σ): 2796.1 µs ± 63.6 µs [User: 601.4 µs, System: 2068.9 µs]
Range (min … max): 2647.5 µs … 4495.0 µs 10000 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
taskset -c 0 ./llrt_optimized ../fixtures/hello.js ran
1.05 ± 0.04 times faster than taskset -c 0 ./llrt_release ../fixtures/hello.js
I ran the benchmark multiple times, with different command orders, etc - in all cases, the PGO-optimized version was faster than the usual release version. However, it would be awesome to perform some more precise benchmarks.
Further steps
I can suggest to do the following things:
Perform more PGO benchmarks with some more precise performance measurements.
If PGO is worth it - add a note to the documentation about it and, possibly, make an option in the build scripts to optimize llrt easier with the existing build infrastructure.
Try to play with Post-Link Optimization (PLO) with tools like LLVM BOLT.
I hope these benchmark results can be interesting to someone.
The text was updated successfully, but these errors were encountered:
This is very interesting! I will rerun the benchmark with PGO (with profile data form test runs) and see the results! PLO is also super interesting but is a different beast! Right now, we use zig as a cross compiler. Since LLRT is a fully static build using musl libc, we can probably use musl sources and clang-15 directly (since it may come with bolt) and apply both PGO, PLO and LTO 🥇
If instrumentation/sampling and testing could be streamlined it would be interesting to see if a per lambda optimization with pgo+bolt would be beneficial for some use cases rather than a generic optimization
Hi!
I tried to apply Profile-Guided Optimization (PGO) to optimize
llrt
performance further (as I already did for many other projects - see all current results here). I performed some basic benchmarks and want to share the results here.Test environment
main
branch on commitc040bfd05a2be8d3300e7a1bbfc9405c42a865fa
Benchmark
As a benchmark, I use the same command as I found in the Makefile -
llrt fixtures/hello.js
. The same scenario is used for the PGO training phase. All PGO optimization steps are done with cargo-pgo tool. PGO instrumented version is built withcargo pgo build
, PGO optimized version -cargo pgo optimize build
.taskset -c 0
is used for reducing CPU scheduling influence on the results.Results
I got the following results:
, where
llrt_release
- usual Release version,llrt_optimized
- PGO-optimized version.I ran the benchmark multiple times, with different command orders, etc - in all cases, the PGO-optimized version was faster than the usual release version. However, it would be awesome to perform some more precise benchmarks.
Further steps
I can suggest to do the following things:
I hope these benchmark results can be interesting to someone.
The text was updated successfully, but these errors were encountered: