-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
a blog to introduce profiling with perf in wamr
- Loading branch information
Showing
3 changed files
with
16,087 additions
and
0 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
--- | ||
title: "Profile Wasm applications with perf in WAMR JIT" | ||
description: "WAMR JIT supports linux perf" | ||
excerpt: "" | ||
date: 2023-12-22T13:14:15+08:00 | ||
lastmod: 2023-12-22T13:14:15+08:00 | ||
draft: false | ||
weight: 50 | ||
images: [] | ||
categories: ["profiling", "tool"] | ||
tags: ["profiling"] | ||
contributors: ["lum1n0us"] | ||
pinned: false | ||
homepage: false | ||
--- | ||
|
||
Profiling a Wasm application can provide valuable insights into its performance. In this blog post, we'll explore how to use [linux perf](https://perf.wiki.kernel.org/index.php/Main_Page) to analyze Wasm applications running on the WAMR with JIT compilation. | ||
|
||
Linux perf is a versatile performance analysis tool that helps developers understand an optimize the behavior of their applications. It provides detailed information about various aspects of program execution, including CPU usage, memory access patterns, and function call traces. | ||
|
||
## With perf report | ||
|
||
Let's dive into profiling a Wasm application using WAMR(aot and jit) and linux perf. | ||
|
||
1. Before profiling, recompile WAMR with the LLVM JIT and AOT compilation option: | ||
|
||
```bash | ||
$ cmake -S . -B bulid -DWAMR_BUILD_JIT=1 -DWAMR_BUILD_AOT=1 | ||
``` | ||
|
||
2. Use perf to profiling | ||
|
||
```bash | ||
# perf.data.raw is perf output. it records all call stacks for every sample event. | ||
# but it can't translate jiited function address to jitted function name | ||
$ perf record -k mono -g --output=perf.data.raw -- iwasm --perf-profile <.wasm or .aot> | ||
``` | ||
|
||
2.1 merge jitted symbols information | ||
|
||
*only if iwasm is running under jit mode. aot doesn't need this step* | ||
|
||
``` bash | ||
# read jit-xxx.dump file generated by wamr and get jitted symbols information | ||
$ perf inject --jit --intput=perf.data.raw --output=perf.data | ||
``` | ||
|
||
You can use `perf report` to review _perf.data_. It includes performance counter profile information recorded via `perf record`. | ||
|
||
``` | ||
76.07% 0.00% iwasm libc.so.6 [.] __libc_start_call_main | ||
| | ||
---__libc_start_call_main | ||
main | ||
| | ||
--68.33%--app_instance_main | ||
wasm_application_execute_main | ||
execute_main | ||
wasm_runtime_call_wasm | ||
wasm_call_function | ||
call_wasm_with_hw_bound_check | ||
wasm_interp_call_wasm | ||
llvm_jit_call_func_bytecode | ||
wasm_runtime_invoke_native | ||
push_args_end | ||
aot_func#1 | ||
aot_func#32 | ||
| | ||
--68.33%--aot_func#5 | ||
| | ||
--68.32%--aot_func#4 | ||
| | ||
--68.19%--aot_func#2 | ||
68.33% 0.00% iwasm iwasm [.] app_instance_main | ||
| | ||
---app_instance_main | ||
wasm_application_execute_main | ||
execute_main | ||
wasm_runtime_call_wasm | ||
wasm_call_function | ||
call_wasm_with_hw_bound_check | ||
wasm_interp_call_wasm | ||
llvm_jit_call_func_bytecode | ||
wasm_runtime_invoke_native | ||
push_args_end | ||
aot_func#1 | ||
aot_func#32 | ||
| | ||
--68.33%--aot_func#5 | ||
| | ||
--68.32%--aot_func#4 | ||
| | ||
--68.19%--aot_func#2 | ||
``` | ||
|
||
## With Flamegraph | ||
|
||
[Flamegraph](https://github.com/brendangregg/FlameGraph0) is a visualization technique that represents the call stack of a program. They provide a clean overview of where CPU time is spent during program execution. | ||
|
||
All based on previous generated _perf.data_. And need to download [FlameGraphs](https://github.com/brendangregg/FlameGraph) firstly. | ||
|
||
```bash | ||
$ perf script -i perf.data > out.perf | ||
#fold stacks | ||
$ ./FlameGraph/stackcollapse-perf.pl out.perf > out.folded | ||
#render a flamegraph | ||
$ ./FlameGraph/flamegraph.pl out.folded.translated > perf.svg | ||
``` | ||
|
||
Because jitted functions all have the same form names like _aot_func#NUMBER_, it's hard for developers to understand. There is a script to do translation. | ||
|
||
```bash | ||
# translate jitted function names into their original wasm function names | ||
$ python trans_wasm_func_name.py --wabt_home <wabt installation> --folded out.folded <wasm> | ||
#render a flamegraph | ||
$ ./FlameGraph/flamegraph.pl out.folded.translated > perf.svg | ||
``` | ||
![example flamegraph](./perf.svg) | ||
|
||
|
||
**For more details, please refer to [doc](https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/perf_tune.md#7-use-linux-perf)** |
Oops, something went wrong.