IR: Add mini native jit MIPS block profiler #18121
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a very simplistic profiler for the IR jits. They function the same way on all backends. This is helpful when profiling to expose actual jit blocks isn't available or isn't great.
I kept it simple so it could work on any backend, in theory. That means:
The results on a God of War demo:
Of interest here is that 0897a184 is so slow on riscv64. It's a bunch of lv.s's (which look like they could be trivially converted to lv.q... though that wouldn't help risc-v anyway) and vdot.q's, and finally some mfv/srl/or/sw (probably this is where it writes the bone matrix data.) But 089a9e14 tells me that float must just be pretty slow, because it's a simple add.s/c.le loop. Maybe FCLASS is performing worse than I thought...
For arm64, the notable thing is 08a0eb98 of course. The MATH_HELPER there indicates this was exclusively time spent in vfpu_sin and vfpu_cos.
Anyway, I think this is useful for finding expensive blocks to examine further, obviously less so than a full/real profiler integrated into the block data, though. When disabled, it doesn't have any negative impact.
-[Unknown]