Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR: Add mini native jit MIPS block profiler #18121

Merged
merged 1 commit into from
Sep 25, 2023

Conversation

unknownbrackets
Copy link
Collaborator

@unknownbrackets unknownbrackets commented Sep 10, 2023

This adds a very simplistic profiler for the IR jits. They function the same way on all backends. This is helpful when profiling to expose actual jit blocks isn't available or isn't great.

I kept it simple so it could work on any backend, in theory. That means:

  • If enabled (compile time only), it starts a thread that literally just spins sampling the block PC.
  • A page of the codeblock is reserved as writable and is kept up to date with the current PC and status (only if enabled.)
  • It logs the top 4 blocks and % samples spent in them on an interval before entering jit.

The results on a God of War demo:

arm64:
Slowest sampled PC #0: 0899b5d0 (z_un_0899b558)/IN_JIT (4.751447%)
Slowest sampled PC #1: 089a9d6c (z_un_089a98b4)/IN_JIT (1.526191%)
Slowest sampled PC #2: 08a0eb98 (z_un_08a0eb98)/MATH_HELPER (0.924867%)
Slowest sampled PC #3: 08990848 (z_un_08990848)/IN_JIT (0.519172%)

x64:
Slowest sampled PC #0: 0899b5d0 (z_un_0899b558)/IN_JIT (7.168335%)
Slowest sampled PC #1: 089a9d6c (z_un_089a98b4)/IN_JIT (5.345056%)
Slowest sampled PC #2: 08990848 (z_un_08990848)/IN_JIT (0.939905%)
Slowest sampled PC #3: 089908f4 (z_un_08990848)/IN_JIT (0.567960%)

riscv64:
Slowest sampled PC #0: 0897a184 (z_un_0897a15c)/IN_JIT (3.743720%)
Slowest sampled PC #1: 0899b5d0 (z_un_0899b558)/IN_JIT (2.998380%)
Slowest sampled PC #2: 089a9d6c (z_un_089a98b4)/IN_JIT (2.870566%)
Slowest sampled PC #3: 089a9e14 (z_un_089a98b4)/IN_JIT (0.972369%)

Of interest here is that 0897a184 is so slow on riscv64. It's a bunch of lv.s's (which look like they could be trivially converted to lv.q... though that wouldn't help risc-v anyway) and vdot.q's, and finally some mfv/srl/or/sw (probably this is where it writes the bone matrix data.) But 089a9e14 tells me that float must just be pretty slow, because it's a simple add.s/c.le loop. Maybe FCLASS is performing worse than I thought...

For arm64, the notable thing is 08a0eb98 of course. The MATH_HELPER there indicates this was exclusively time spent in vfpu_sin and vfpu_cos.

Anyway, I think this is useful for finding expensive blocks to examine further, obviously less so than a full/real profiler integrated into the block data, though. When disabled, it doesn't have any negative impact.

-[Unknown]

@unknownbrackets unknownbrackets added arm64jit Occurs with JIT on 64-bit ARM devices, but not another CPU backend. x86jit x86/x64 JIT bugs RISC-V labels Sep 10, 2023
@hrydgard hrydgard added this to the v1.17.0 milestone Sep 11, 2023
@hrydgard
Copy link
Owner

I'll get it in, I didn't notice before that it's isolated to the IR stuff.

@hrydgard hrydgard merged commit 5145698 into hrydgard:master Sep 25, 2023
@unknownbrackets unknownbrackets deleted the jit-ir-profiler branch September 25, 2023 14:08
@unknownbrackets unknownbrackets modified the milestones: v1.17.0, v1.16.4 Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arm64jit Occurs with JIT on 64-bit ARM devices, but not another CPU backend. RISC-V x86jit x86/x64 JIT bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants