Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up LiverpoolToVK::SurfaceFormat #1982

Merged

Conversation

hspir404
Copy link
Contributor

@hspir404 hspir404 commented Dec 30, 2024

In Bloodborne this shows up as the function with the very highest cumulative "exclusive time". This is true both in scenes that perform poorly, and scenes that perform well.

I took (approximately) 10s samples using an 8khz sampling profiler.

In the Nightmare Grand Cathedral (looking towards the stairs, at the rest of the level):

  • Reduced total time from 757.34ms to 82.61ms (out of ~10000ms).
  • Reduced average frame times by 2ms (though according to the graph, the gap may be as big as 9ms every N frames).

00-savedata.zip
Before:
00-before
After:
00-after
Before:
00-before-profile
After:
00-after-profile

In the Hunter's Dream (in the spawn position):

  • Reduced the total time from 486.50ms to 53.83ms (out of ~10000ms).
  • Average frame times appear to be roughly the same.

01-savedata.zip
Before:
01-before
After:
01-after
Before:
01-before-profile
After:
01-after-profile

These are profiles of the change vs the version currently in the main branch. These improvements also improve things in the locking branch. They might improve them even more in that branch, but I didn't bother keeping track of my measurements as well in that branch. I believe this change will still be useful even when that branch is stabilized and merged.

It could be there are other bottlenecks in rendering on the main branch that are preventing this code from being the critical path in places like the Hunter's Dream, where the frame times aren't as high. That might explain why the reduction in call times isn't resulting in a higher frame rate in every scene.

In Bloodborne this shows up as the function with the very highest cumulative "exclusive time". This is true both in scenes that perform poorly, and scenes that perform well.

I took (approximately) 10s samples using an 8khz sampling profiler.

In the Nightmare Grand Cathedral (looking towards the stairs, at the rest of the level):
- Reduced total time from 757.34ms to 82.61ms (out of ~10000ms).
- Reduced average frame times by 2ms (though according to the graph, the gap may be as big as 9ms every N frames).

In the Hunter's Dream (in the spawn position):
- Reduced the total time from 486.50ms to 53.83ms (out of ~10000ms).
- Average frame times appear to be roughly the same.

These are profiles of the change vs the version currently in the main branch. These improvements also improve things in the `threading` branch. They might improve them even more in that branch, but I didn't bother keeping track of my measurements as well in that branch. I believe this change will still be useful even when that branch is stabilized and merged.

It could be there are other bottlenecks in rendering on this branch that are preventing this code from being the critical path in places like the Hunter's Dream, where performance isn't currently as constrained. That might explain why the reduction in call times isn't resulting in a higher frame rate.
@bigol83
Copy link
Contributor

bigol83 commented Dec 30, 2024

This improves Resident Evil 4 and Yakuza 0 too by 2 fps.

@squidbus
Copy link
Collaborator

This is a nice perf improvement but I would prefer a solution that does not now involve maintaining two different copies of format mappings.

@squidbus
Copy link
Collaborator

squidbus commented Dec 31, 2024

One suggestion would be if we could bitwise combine data and number formats into an indexed lookup to replace the find_if, instead of breaking out into a whole extra switch-based mapping.

@hspir404
Copy link
Contributor Author

hspir404 commented Dec 31, 2024

Changed to a lookup table.

It adds a 4kb table (6 bits * 4 bits * 4 bytes per vulkan format), but seems to result in a smaller executable than when the switch statement was there (even though I verified the switch was only 975 bytes of code). Confused how that works, but apparently this is a 35MB executable anyway so I guess we're not sweating such things.

It also runs a gnat's hair faster than the switch statement did on my computer.

In the Nightmare Grand Cathedral from the save above:

after

Guess I should do a clang format to tidy this now.

@raphaelthegreat raphaelthegreat merged commit 6862c9a into shadps4-emu:main Jan 2, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants