-
-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up LiverpoolToVK::SurfaceFormat #1982
Speed up LiverpoolToVK::SurfaceFormat #1982
Conversation
In Bloodborne this shows up as the function with the very highest cumulative "exclusive time". This is true both in scenes that perform poorly, and scenes that perform well. I took (approximately) 10s samples using an 8khz sampling profiler. In the Nightmare Grand Cathedral (looking towards the stairs, at the rest of the level): - Reduced total time from 757.34ms to 82.61ms (out of ~10000ms). - Reduced average frame times by 2ms (though according to the graph, the gap may be as big as 9ms every N frames). In the Hunter's Dream (in the spawn position): - Reduced the total time from 486.50ms to 53.83ms (out of ~10000ms). - Average frame times appear to be roughly the same. These are profiles of the change vs the version currently in the main branch. These improvements also improve things in the `threading` branch. They might improve them even more in that branch, but I didn't bother keeping track of my measurements as well in that branch. I believe this change will still be useful even when that branch is stabilized and merged. It could be there are other bottlenecks in rendering on this branch that are preventing this code from being the critical path in places like the Hunter's Dream, where performance isn't currently as constrained. That might explain why the reduction in call times isn't resulting in a higher frame rate.
This improves Resident Evil 4 and Yakuza 0 too by 2 fps. |
This is a nice perf improvement but I would prefer a solution that does not now involve maintaining two different copies of format mappings. |
One suggestion would be if we could bitwise combine data and number formats into an indexed lookup to replace the |
Changed to a lookup table. It adds a 4kb table (6 bits * 4 bits * 4 bytes per vulkan format), but seems to result in a smaller executable than when the switch statement was there (even though I verified the switch was only 975 bytes of code). Confused how that works, but apparently this is a 35MB executable anyway so I guess we're not sweating such things. It also runs a gnat's hair faster than the switch statement did on my computer. In the Nightmare Grand Cathedral from the save above: Guess I should do a clang format to tidy this now. |
In Bloodborne this shows up as the function with the very highest cumulative "exclusive time". This is true both in scenes that perform poorly, and scenes that perform well.
I took (approximately) 10s samples using an 8khz sampling profiler.
In the Nightmare Grand Cathedral (looking towards the stairs, at the rest of the level):
00-savedata.zip
Before:
After:
Before:
After:
In the Hunter's Dream (in the spawn position):
01-savedata.zip
Before:
After:
Before:
After:
These are profiles of the change vs the version currently in the main branch. These improvements also improve things in the
locking
branch. They might improve them even more in that branch, but I didn't bother keeping track of my measurements as well in that branch. I believe this change will still be useful even when that branch is stabilized and merged.It could be there are other bottlenecks in rendering on the main branch that are preventing this code from being the critical path in places like the Hunter's Dream, where the frame times aren't as high. That might explain why the reduction in call times isn't resulting in a higher frame rate in every scene.