Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer_cache: Improve buffer cache locking contention #1973

Merged
merged 6 commits into from
Jan 2, 2025
Merged

Conversation

raphaelthegreat
Copy link
Collaborator

@raphaelthegreat raphaelthegreat commented Dec 29, 2024

There is significant performance being lost by the buffer cache locking being held during memory synchronization and invalidation. This lock is not required however and used to properly synchronize the underlying memory tracker. This PR is an alternative to #1952 (and many thanks to @hspir404 for discovering this).

Overview of changes:

  • IsRegionRegistered should now be thread-safe. Usage of operator[] is replaced by find() as the former will create a new L1 region in the map which is not thread-safe and unwanted even, find will not do that. Buffer access is also guarded by a shared lock so multiple threads can read from the slot vector, but if the gpu threads inserts a new one it will be properly synchronized. Creation of buffers is expensive and calls the driver so it uses a regular mutex.

  • Memory tracker has been significantly simplified by removing dead code or features we aren't used (such as variable region sizes). Instead those are hardcoded as constexpr variables which might also result in slightly better codegen. Each region now includes a separate lock that is acquired on state modifying functions, while region query can be performed without the lock. From experiments I did between mutex and spinlock, the latter was noticeably faster so its used here (possibly because it doesn't incur an expensive kernel sleep)

  • Memory tracker now only creates new regions on buffer cache upload. We dont need to create regions when marking regions as cpu or gpu dirty. This might reduce memory usage a bit and make invalidations slightly faster.

  • Page manager lock has also been switched to a spin lock. From testing this seems to give a further 1fps boost in cpu bottlenecked programs

@hspir404
Copy link
Contributor

hspir404 commented Dec 31, 2024

I am intermittently seeing a crash (Unhandled exception thrown: read access violation) after the main menu, when loading into the game:

shadps4.exe!ZydisInputPeek(ZydisDecoderState_ * state, ZydisDecodedInstruction_ * instruction, unsigned char * value) Line 260	C
shadps4.exe!ZydisCollectOptionalPrefixes(ZydisDecoderState_ * state, ZydisDecodedInstruction_ * instruction) Line 3097	C
shadps4.exe!ZydisDecoderDecodeInstruction(const ZydisDecoder_ * decoder, ZydisDecoderContext_ * context, const void * buffer, unsigned __int64 length, ZydisDecodedInstruction_ * instruction) Line 5041	C
shadps4.exe!ZydisDecoderDecodeFull(const ZydisDecoder_ * decoder, const void * buffer, unsigned __int64 length, ZydisDecodedInstruction_ * instruction, ZydisDecodedOperand_ * operands) Line 4999	C
shadps4.exe!Common::DecoderImpl::decodeInstruction(ZydisDecodedInstruction_ & inst, ZydisDecodedOperand_ * operands, void * data, unsigned __int64 size) Line 45	C++
shadps4.exe!Core::TryPatch(unsigned char * code, Core::PatchModule * module) Line 951	C++
shadps4.exe!Core::TryPatchJit(void * code_address) Line 1170	C++
shadps4.exe!Core::SignalDispatch::DispatchAccessViolation(void * context, void * fault_address) Line 148	C++
shadps4.exe!Core::SignalHandler(_EXCEPTION_POINTERS * pExp) Line 29	C++
ntdll.dll!RtlpCallVectoredHandlers()	Unknown
ntdll.dll!RtlDispatchException()	Unknown
ntdll.dll!KiUserExceptionDispatch()	Unknown
00000009275c543f()	Unknown

It seems like it might happen more on more demanding scenes. The top of the stairs in the nightmare cathedral is the place I've been seeing it most so far.

I also have seen this exception, with basically the same repro steps and timing (Access violation reading location 0x000001826E90F298):

shadps4.exe!VideoCore::BufferCache::IsRegionRegistered(unsigned __int64 addr, unsigned __int64 size) Line 343	C++
shadps4.exe!VideoCore::BufferCache::InvalidateMemory(unsigned __int64 device_addr, unsigned __int64 size) Line 57	C++
shadps4.exe!Vulkan::Rasterizer::InvalidateMemory(unsigned __int64 addr, unsigned __int64 size) Line 909	C++
shadps4.exe!Core::SignalDispatch::DispatchAccessViolation(void * context, void * fault_address) Line 148	C++
shadps4.exe!Core::SignalHandler(_EXCEPTION_POINTERS * pExp) Line 29	C++
ntdll.dll!RtlpCallVectoredHandlers()	Unknown
ntdll.dll!RtlDispatchException()	Unknown
ntdll.dll!KiUserExceptionDispatch()	Unknown
0000000902616fcf()	Unknown
0000000901064051()	Unknown
0000000185e1acb0()	Unknown
00001dc400001dc4()	Unknown
deadbeef54321abc()	Unknown
0000000000000020()	Unknown
0000000000000002()	Unknown
0000000000000003()	Unknown
0000000000000002()	Unknown
0000000000000002()	Unknown
00000007ef57fb60()	Unknown
000000090075725e()	Unknown
000000025c8c6200()	Unknown
00000007ef57fbc8()	Unknown
000000025b0a16f0()	Unknown
00000007ef57fbc0()	Unknown
00000007ef57fbc0()	Unknown
000000000001d99c()	Unknown
00000007ef57fbc8()	Unknown
000000025b0a16f0()	Unknown
00000007ef57fbb0()	Unknown
00000009007668e5()	Unknown

If I manage to load in game and wait until things are stable, then the game seems to run fairly flawlessly after that for a long time. I've managed to run through several levels, defeat multiple bosses, and warp to the hunter's dream and back to lanterns with this patch. However it often will crash on that very first load - at least at that spot in the game.

@raphaelthegreat
Copy link
Collaborator Author

raphaelthegreat commented Dec 31, 2024

If you change this to std::deque are crashes improved?

std::vector<Chunk> chunks;

If still not, try reverting the change in IsRegionRegistered

@hspir404
Copy link
Contributor

Tried the deque, tried reverting all the change in IsRegionRegistered (except the shared lock on the mutex).
I so far haven't seen the IsRegionRegistered related crash again. Not sure which of those two changes were necessary, or if that part of the problem is totally solved or not. I'll test that more later.

However I'm still seeing access violations with a callstack like this:

shadps4.exe!ZydisInputPeek(ZydisDecoderState_ * state, ZydisDecodedInstruction_ * instruction, unsigned char * value) Line 260	C
shadps4.exe!ZydisCollectOptionalPrefixes(ZydisDecoderState_ * state, ZydisDecodedInstruction_ * instruction) Line 3097	C
shadps4.exe!ZydisDecoderDecodeInstruction(const ZydisDecoder_ * decoder, ZydisDecoderContext_ * context, const void * buffer, unsigned __int64 length, ZydisDecodedInstruction_ * instruction) Line 5041	C
shadps4.exe!ZydisDecoderDecodeFull(const ZydisDecoder_ * decoder, const void * buffer, unsigned __int64 length, ZydisDecodedInstruction_ * instruction, ZydisDecodedOperand_ * operands) Line 4999	C
shadps4.exe!Common::DecoderImpl::decodeInstruction(ZydisDecodedInstruction_ & inst, ZydisDecodedOperand_ * operands, void * data, unsigned __int64 size) Line 45	C++
shadps4.exe!Core::TryPatch(unsigned char * code, Core::PatchModule * module) Line 951	C++
shadps4.exe!Core::TryPatchJit(void * code_address) Line 1170	C++
shadps4.exe!Core::SignalDispatch::DispatchAccessViolation(void * context, void * fault_address) Line 148	C++
shadps4.exe!Core::SignalHandler(_EXCEPTION_POINTERS * pExp) Line 29	C++
ntdll.dll!RtlpCallVectoredHandlers()	Unknown
ntdll.dll!RtlDispatchException()	Unknown
ntdll.dll!KiUserExceptionDispatch()	Unknown
00000009257b50f7()	Unknown
...
000000090076af86()	Unknown

I checked my log. It looks to me like I might have the wrong copy of my sysmodules or something, since it seems they're failing to load.

I also am not sure how the GetModule call is succeeding inside TryPatchJit. It is finding the module with address range:
{mutex=locked start=0x000000091fffc000 "UH‰åAWAVAUATSPH�\x1dK€" end=0x0000000920008000 "" ...}

Which looks to me like it doesn't cover the range of this patch. 0x000000091fffc000 (the key for the map) isn't even greater than the address it's trying to resolve 00000009257b50f7, so I'd think std::map::upper_bound would not be returning that module.

@hspir404
Copy link
Contributor

hspir404 commented Jan 2, 2025

After looking at ZydisInputPeek code more, and testing with and without these changes in place, I'm sure it's a parallel problem, and totally unrelated to this PR.

I rebased the code again on top of main and tested. I still get problems in IsRegionRegistered if I do no changes. In fact it is quite easy and consistent to repro the problem simply loading into my saved game at this spot (and if loading in doesn't do it, opening the menu overlay w/ the start button seems to easily trigger it).

If I swap ObjectPool to use a deque it doesn't fix the problem.
If I revert the change to BufferCache::IsRegionRegistered that used page_table.find instead of operator[], it DOES seem to fix the problem. So either that should be reverted, or more work needs to be done to make sure that find call doesn't cause issues.

As for reverting changes to BufferCache::IsRegionRegistered, there was a lock added by this PR on read access to BufferCache::slot_buffers. Isn't a lock needed before before every such access? It seems there's a potential for a race condition between SlotVector::Reserve (during an insert when the free list is full), and any indexing into the slot vector. I think this would include both SlotVector::operator[] and SlotVector::erase.

@polybiusproxy
Copy link
Collaborator

You seem to be getting a game crash.

@hspir404
Copy link
Contributor

hspir404 commented Jan 2, 2025

You seem to be getting a game crash.

If you're referring to the ZydisInputPeek issue, then that makes sense. I verified it happens even without this PR, so I think it can be taken to a different ticket.

@polybiusproxy
Copy link
Collaborator

If you see the whole stack trace, you can see the last guest address 00000009257b50f7 (might be libc), which appears to be executing an unhandled instruction (likely ud2, so it's raising an exception).

@raphaelthegreat raphaelthegreat merged commit c254470 into main Jan 2, 2025
20 checks passed
@polybiusproxy polybiusproxy deleted the locking branch January 2, 2025 17:50
@hgh32
Copy link

hgh32 commented Jan 3, 2025

This seem to be regression to intel gpu, which further break the graphics
Edit: this fix in 5559f35
r
r2
shad_log.txt

@JvictorVentura
Copy link

JvictorVentura commented Jan 3, 2025

With this PR, on Bloodborne I get way less fps on Linux than before (40fps before, 17fps now), but only on linux, if I play on windows 10 I get higher fps (40fps before, 45-50 now).

@rharish101
Copy link

rharish101 commented Jan 3, 2025

With this PR, on Bloodborne I get way less fps on Linux than before (40fps before, 17fps now), but only on linux, if I play on windows 10 I get higher fps (40fps before, 45-50 now).

I'm on Arch Linux with AMD CPU and discrete AMD GPU. My frame rate was ~55-60 previously in Bloodborne (with the 60 FPS patch), and after this PR, it is a stable 60 now. Just wanted to say that the Linux issue is not universal.

@JvictorVentura
Copy link

my cpu is intel, could probably be that.

Xcedf added a commit to Xcedf/shadPS4 that referenced this pull request Jan 3, 2025
ngoquang2708 added a commit to ngoquang2708/shadPS4 that referenced this pull request Jan 9, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck
ngoquang2708 added a commit to ngoquang2708/shadPS4 that referenced this pull request Jan 10, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck
ngoquang2708 added a commit to ngoquang2708/shadPS4 that referenced this pull request Jan 10, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck
ngoquang2708 added a commit to ngoquang2708/shadPS4 that referenced this pull request Jan 10, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck
Sjknight413 added a commit to Sjknight413/shadPS4 that referenced this pull request Jan 13, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck

Co-authored-by: Quang Ngô <[email protected]>
ngoquang2708 added a commit to ngoquang2708/shadPS4 that referenced this pull request Jan 14, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck
ngoquang2708 added a commit to ngoquang2708/shadPS4 that referenced this pull request Jan 18, 2025
Fix performance regression with shadps4-emu#1973 on SteamDeck
squidbus pushed a commit that referenced this pull request Jan 18, 2025
Fix performance regression with #1973 on SteamDeck
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants