-
-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer_cache: Improve buffer cache locking contention #1973
Conversation
I am intermittently seeing a crash (Unhandled exception thrown: read access violation) after the main menu, when loading into the game:
It seems like it might happen more on more demanding scenes. The top of the stairs in the nightmare cathedral is the place I've been seeing it most so far. I also have seen this exception, with basically the same repro steps and timing (Access violation reading location 0x000001826E90F298):
If I manage to load in game and wait until things are stable, then the game seems to run fairly flawlessly after that for a long time. I've managed to run through several levels, defeat multiple bosses, and warp to the hunter's dream and back to lanterns with this patch. However it often will crash on that very first load - at least at that spot in the game. |
If you change this to std::deque are crashes improved? shadPS4/src/common/object_pool.h Line 103 in 65cd3be
If still not, try reverting the change in IsRegionRegistered |
Tried the deque, tried reverting all the change in IsRegionRegistered (except the shared lock on the mutex). However I'm still seeing access violations with a callstack like this:
I checked my log. It looks to me like I might have the wrong copy of my sysmodules or something, since it seems they're failing to load. I also am not sure how the Which looks to me like it doesn't cover the range of this patch. 0x000000091fffc000 (the key for the map) isn't even greater than the address it's trying to resolve 00000009257b50f7, so I'd think std::map::upper_bound would not be returning that module. |
After looking at I rebased the code again on top of main and tested. I still get problems in If I swap As for reverting changes to |
You seem to be getting a game crash. |
If you're referring to the |
If you see the whole stack trace, you can see the last guest address |
This seem to be regression to intel gpu, which further break the graphics |
With this PR, on Bloodborne I get way less fps on Linux than before (40fps before, 17fps now), but only on linux, if I play on windows 10 I get higher fps (40fps before, 45-50 now). |
I'm on Arch Linux with AMD CPU and discrete AMD GPU. My frame rate was ~55-60 previously in Bloodborne (with the 60 FPS patch), and after this PR, it is a stable 60 now. Just wanted to say that the Linux issue is not universal. |
my cpu is intel, could probably be that. |
…4-emu#1973)" This reverts commit c254470.
Fix performance regression with shadps4-emu#1973 on SteamDeck
Fix performance regression with shadps4-emu#1973 on SteamDeck
Fix performance regression with shadps4-emu#1973 on SteamDeck
Fix performance regression with shadps4-emu#1973 on SteamDeck
Fix performance regression with shadps4-emu#1973 on SteamDeck Co-authored-by: Quang Ngô <[email protected]>
Fix performance regression with shadps4-emu#1973 on SteamDeck
Fix performance regression with shadps4-emu#1973 on SteamDeck
Fix performance regression with #1973 on SteamDeck
There is significant performance being lost by the buffer cache locking being held during memory synchronization and invalidation. This lock is not required however and used to properly synchronize the underlying memory tracker. This PR is an alternative to #1952 (and many thanks to @hspir404 for discovering this).
Overview of changes:
IsRegionRegistered should now be thread-safe. Usage of operator[] is replaced by find() as the former will create a new L1 region in the map which is not thread-safe and unwanted even, find will not do that. Buffer access is also guarded by a shared lock so multiple threads can read from the slot vector, but if the gpu threads inserts a new one it will be properly synchronized. Creation of buffers is expensive and calls the driver so it uses a regular mutex.
Memory tracker has been significantly simplified by removing dead code or features we aren't used (such as variable region sizes). Instead those are hardcoded as constexpr variables which might also result in slightly better codegen. Each region now includes a separate lock that is acquired on state modifying functions, while region query can be performed without the lock. From experiments I did between mutex and spinlock, the latter was noticeably faster so its used here (possibly because it doesn't incur an expensive kernel sleep)
Memory tracker now only creates new regions on buffer cache upload. We dont need to create regions when marking regions as cpu or gpu dirty. This might reduce memory usage a bit and make invalidations slightly faster.
Page manager lock has also been switched to a spin lock. From testing this seems to give a further 1fps boost in cpu bottlenecked programs