Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atomic.cpp: Spinlock powering shared_ptr atomics can lead to priority inversion #370

Closed
BillyONeal opened this issue Dec 11, 2019 · 8 comments · Fixed by #1200
Closed

atomic.cpp: Spinlock powering shared_ptr atomics can lead to priority inversion #370

BillyONeal opened this issue Dec 11, 2019 · 8 comments · Fixed by #1200
Labels
bug Something isn't working fixed Something works now, yay! performance Must go faster

Comments

@BillyONeal
Copy link
Member

Describe the bug
shared_ptr's atomic functions (e.g. std::atomic_store) are powered by an external lock present in our separately compiled machinery, _Lock_shared_ptr_spin_lock and _Unlock_shared_ptr_spin_lock. This was written to use a plain spin lock with no mitigation to go to sleep if spinning is taking too long, nor is there any mitigation for memory bandwidth consumption.

STL/stl/src/atomic.cpp

Lines 13 to 34 in aa0a7a3

// SPIN LOCK FOR shared_ptr ATOMIC OPERATIONS
volatile long _Shared_ptr_flag;
_CRTIMP2_PURE void __cdecl _Lock_shared_ptr_spin_lock() { // spin until _Shared_ptr_flag successfully set
#ifdef _M_ARM
while (_InterlockedExchange_acq(&_Shared_ptr_flag, 1)) {
__yield();
}
#else // _M_ARM
while (_interlockedbittestandset(&_Shared_ptr_flag, 0)) { // set bit 0
}
#endif // _M_ARM
}
_CRTIMP2_PURE void __cdecl _Unlock_shared_ptr_spin_lock() { // release previously obtained lock
#ifdef _M_ARM
__dmb(_ARM_BARRIER_ISH);
__iso_volatile_store32((volatile int*) &_Shared_ptr_flag, 0);
#else // _M_ARM
_interlockedbittestandreset(&_Shared_ptr_flag, 0); // reset bit 0
#endif // _M_ARM
}

In single threaded scenarios this is particularly bad when a low priority thread currently holds the spinlock, and a high priority thread spins "effectively forever".

In an ABI breaking release, it would be nice to reuse the low order bit of the reference count control block pointer in the shared_ptr; but even without an ABI breaking release we could do better by replacing the spinlock entirely with something like SRWLOCK on Vista and later, implementing exponential backoff, or relying on C++20 std::atomic waiting features once we have those implemented.

Also tracked by Developer Community as DevCom-716238 and Microsoft-internal VSO-975564.

@BillyONeal BillyONeal added bug Something isn't working performance Must go faster labels Dec 11, 2019
@StephanTLavavej
Copy link
Member

I believe you've filed a duplicate of #86. If so, which one do you want to keep?

@StephanTLavavej
Copy link
Member

Also, how can this possibly be fixed without breaking ABI, given that existing object files could be calling _Lock_shared_ptr_spin_lock() and expecting that to prevent newer object files from modifying a shared_ptr?

@BillyONeal
Copy link
Member Author

Also, how can this possibly be fixed without breaking ABI, given that existing object files could be calling _Lock_shared_ptr_spin_lock() and expecting that to prevent newer object files from modifying a shared_ptr?

This bug is not "make shared_ptr stop using a common spinlock", it is "make common lock shared_ptr uses less bad". If we didn't have to worry about XP I would wholesale replace it with an SRWLOCK.

@StephanTLavavej
Copy link
Member

Ah, changing what the separately compiled functions do. I agree that that would be ABI-safe. Thanks!

@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented Apr 12, 2020

I think it can be done in scope of #593 as follows:

// SPIN LOCK FOR shared_ptr ATOMIC OPERATIONS
std::atomic<volatile long> _Shared_ptr_flag{};

_CRTIMP2_PURE void __cdecl _Lock_shared_ptr_spin_lock() { // spin until _Shared_ptr_flag successfully set
#ifdef _M_ARM
    while (_InterlockedExchange_acq(&_Shared_ptr_flag._Storage._Value, 1)) {
        _Shared_ptr_flag.wait(1, std::memory_order_acquire);
    }
#else // _M_ARM
    while (_interlockedbittestandset(&_Shared_ptr_flag._Storage._Value, 0)) { // set bit 0
        _Shared_ptr_flag.wait(1, std::memory_order_acquire);
    }
#endif // _M_ARM
}

But need to link atomic satellite. Is linking atomic satellite acceptable here?

@BillyONeal
Copy link
Member Author

@AlexGuteniev: I think it would be better to attempt to detect and use SRWLOCK here since that would engage on more OSes.

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Apr 12, 2020
@AlexGuteniev
Copy link
Contributor

But atomic wait in #593 has fallback with locking on SRW lock, so OS support will be there starting Vista. And SRW lock without checking variable directly has unnecessary API function call overhead.

And I've already added wait-powering for this spinlock in #593 :-)

@AlexGuteniev
Copy link
Contributor

(But if you still disagree, I can revert this part)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Something works now, yay! performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants