Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race in Scheduler.cpp #1753

Closed
wants to merge 2 commits into from
Closed

Fix data race in Scheduler.cpp #1753

wants to merge 2 commits into from

Conversation

crazyhappygame
Copy link
Contributor

@crazyhappygame crazyhappygame commented Mar 16, 2024

Describe your changes

Fix one of the many thread races found with thread sanitizer

Issue ticket number and link

: #1752

Checklist before requesting a review

For each PR

  • Add Copyright if it missed:
    - "Copyright (c) 2019-present Axmol Engine contributors (see AUTHORS.md)."

  • I have performed a self-review of my code.

    Optional:

    • I have checked readme and add important infos to this PR.
    • I have added/adapted some tests too.

For core/new feature PR

  • I have checked readme and add important infos to this PR.
  • I have added thorough tests.

@crazyhappygame
Copy link
Contributor Author

@halx99 could you merge it?

@rh101
Copy link
Contributor

rh101 commented Mar 17, 2024

Hold up, there is no detail regarding change made, so can you please explain the modification?

From what I can see, this:

if (!_actionsToPerform.empty())
{
    _performMutex.lock();
    // fixed #4123: Save the callback functions, they must be invoked after '_performMutex.unlock()', otherwise if
    // new functions are added in callback, it will cause thread deadlock.
    auto temp = std::move(_actionsToPerform);
    _performMutex.unlock();

    for (const auto& function : temp)
    {
        function();
    }
}

is changed to this:

const auto temp = [this](){
    // fixed #4123: Save the callback functions, they must be invoked after '_performMutex.unlock()', otherwise if
    // new functions are added in callback, it will cause thread deadlock.
    std::lock_guard<std::mutex> lock(_performMutex);
    return std::move(_actionsToPerform);
}();

for (const auto& function : temp)
{
    function();
}

This is now forcing a lock on each Scheduler::update() call, whether it's required or not. The call to _actionsToPerform.empty() should not cause any data race conditions in this specific scenario, so why do you believe it does?

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

agree, there no difference with original implemention

@halx99 halx99 added the invalid This doesn't seem right label Mar 17, 2024
@halx99 halx99 closed this Mar 17, 2024
@crazyhappygame
Copy link
Contributor Author

@halx99 @rh101
Problem:
data race
Write under lock

void Scheduler::runOnAxmolThread(std::function<void()> action)
{
    std::lock_guard<std::mutex> lock(_performMutex);
    _actionsToPerform.emplace_back(std::move(action));
}

Read without lock

void Scheduler::update(float dt)
{
....
    if (!_actionsToPerform.empty())
    {

"data race" == undefined behaviour in C++.
It is not false positive it is real data race.

Please check thread sanitizer issue from the below error from #1752.

WARNING: ThreadSanitizer: data race (pid=8716)
  Read of size 8 at 0x000114b03a18 by main thread:
    #0 std::__1::vector<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()>>>::empty[abi:v160006]() const vector:552 (cpp-tests:arm64+0x101ec4e24)
    #1 ax::Scheduler::update(float) Scheduler.cpp:855 (cpp-tests:arm64+0x102014d94)
    #2 ax::Director::drawScene() Director.cpp:289 (cpp-tests:arm64+0x101f527e8)
    #3 ax::Director::mainLoop() Director.cpp:1547 (cpp-tests:arm64+0x101f59da0)
    #4 ax::Application::run() Application-mac.mm:76 (cpp-tests:arm64+0x10236560c)
    #5 main main.cpp:32 (cpp-tests:arm64+0x1019a6ce4)

  Previous write of size 8 at 0x000114b03a18 by thread T18 (mutexes: write M0):
    #0 std::__1::enable_if<is_move_constructible<std::__1::function<void ()>*>::value && is_move_assignable<std::__1::function<void ()>*>::value, void>::type std::__1::swap[abi:v160006]<std::__1::function<void ()>*>(std::__1::function<void ()>*&, std::__1::function<void ()>*&) swap.h:40 (cpp-tests:arm64+0x101e96780)
    #1 std::__1::vector<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()>>>::__swap_out_circular_buffer(std::__1::__split_buffer<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()>>&>&) vector:920 (cpp-tests:arm64+0x101e95ce0)
    #2 void std::__1::vector<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()>>>::__emplace_back_slow_path<std::__1::function<void ()>>(std::__1::function<void ()>&&) vector:1584 (cpp-tests:arm64+0x102017524)
    #3 std::__1::function<void ()>& std::__1::vector<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()>>>::emplace_back<std::__1::function<void ()>>(std::__1::function<void ()>&&) vector:1603 (cpp-tests:arm64+0x10201433c)
    #4 ax::Scheduler::runOnAxmolThread(std::__1::function<void ()>) Scheduler.cpp:722 (cpp-tests:arm64+0x10201423c)
    #5 ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()::operator()() const AsyncTaskPool.h:137 (cpp-tests:arm64+0x101ee8d9c)
    #6 decltype(std::declval<ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()>()()) std::__1::__invoke[abi:v160006]<ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()>(ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()&&) invoke.h:394 (cpp-tests:arm64+0x101ee8b48)
    #7 void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()>&, std::__1::__tuple_indices<>) thread:288 (cpp-tests:arm64+0x101ee8af4)
    #8 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()>>(void*) thread:299 (cpp-tests:arm64+0x101ee8588)

  As if synchronized via sleep:
    #0 nanosleep <null>:264457860 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2f618)
    #1 std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>> const&) <null>:264457860 (libc++.1.dylib:arm64e+0x22b88)
    #2 main main.cpp:32 (cpp-tests:arm64+0x1019a6ce4)

  Location is heap block of size 368 at 0x000114b03900 allocated by main thread:
    #0 operator new(unsigned long) <null>:264457860 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x84420)
    #1 ax::Director::init() Director.cpp:120 (cpp-tests:arm64+0x101f50280)
    #2 ax::Director::getInstance() Director.cpp:95 (cpp-tests:arm64+0x101f5010c)
    #3 ax::Configuration::loadConfigFile(std::__1::basic_string_view<char, std::__1::char_traits<char>>) Configuration.cpp:401 (cpp-tests:arm64+0x101ef5a58)
    #4 AppDelegate::applicationDidFinishLaunching() AppDelegate.cpp:64 (cpp-tests:arm64+0x1001bb8f0)
    #5 ax::Application::run() Application-mac.mm:57 (cpp-tests:arm64+0x10236554c)
    #6 main main.cpp:32 (cpp-tests:arm64+0x1019a6ce4)

  Mutex M0 (0x000114b03a30) created at:
    #0 pthread_mutex_lock <null>:264457860 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3cf8c)
    #1 std::__1::mutex::lock() <null>:264457860 (libc++.1.dylib:arm64e+0x16710)
    #2 std::__1::lock_guard<std::__1::mutex>::lock_guard[abi:v160006](std::__1::mutex&) __mutex_base:94 (cpp-tests:arm64+0x101e934b8)
    #3 ax::Scheduler::runOnAxmolThread(std::__1::function<void ()>) Scheduler.cpp:721 (cpp-tests:arm64+0x102014228)
    #4 ax::AudioEngineImpl::_play2d(ax::AudioCache*, int) AudioEngineImpl.cpp:595 (cpp-tests:arm64+0x101ec29b4)
    #5 decltype(*std::declval<ax::AudioEngineImpl*&>().*std::declval<void (ax::AudioEngineImpl::*&)(ax::AudioCache*, int)>()(std::declval<ax::AudioCache*&>(), std::declval<int&>())) std::__1::__invoke[abi:v160006]<void (ax::AudioEngineImpl::*&)(ax::AudioCache*, int), ax::AudioEngineImpl*&, ax::AudioCache*&, int&, void>(void (ax::AudioEngineImpl::*&)(ax::AudioCache*, int), ax::AudioEngineImpl*&, ax::AudioCache*&, int&) invoke.h:359 (cpp-tests:arm64+0x101edf75c)
    #6 std::__1::__bind_return<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), std::__1::tuple<ax::AudioEngineImpl*, ax::AudioCache*, int>, std::__1::tuple<>, __is_valid_bind_return<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), std::__1::tuple<ax::AudioEngineImpl*, ax::AudioCache*, int>, std::__1::tuple<>>::value>::type std::__1::__apply_functor[abi:v160006]<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), std::__1::tuple<ax::AudioEngineImpl*, ax::AudioCache*, int>, 0ul, 1ul, 2ul, std::__1::tuple<>>(void (ax::AudioEngineImpl::*&)(ax::AudioCache*, int), std::__1::tuple<ax::AudioEngineImpl*, ax::AudioCache*, int>&, std::__1::__tuple_indices<0ul, 1ul, 2ul>, std::__1::tuple<>&&) bind.h:263 (cpp-tests:arm64+0x101edf62c)
    #7 std::__1::__bind_return<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), std::__1::tuple<ax::AudioEngineImpl*, ax::AudioCache*, int>, std::__1::tuple<>, __is_valid_bind_return<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), std::__1::tuple<ax::AudioEngineImpl*, ax::AudioCache*, int>, std::__1::tuple<>>::value>::type std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>::operator()[abi:v160006]<>() bind.h:295 (cpp-tests:arm64+0x101edf574)
    #8 decltype(std::declval<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>&>()()) std::__1::__invoke[abi:v160006]<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>&>(std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>&) invoke.h:394 (cpp-tests:arm64+0x101edf514)
    #9 void std::__1::__invoke_void_return_wrapper<void, true>::__call<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>&>(std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>&) invoke.h:487 (cpp-tests:arm64+0x101edf478)
    #10 std::__1::__function::__alloc_func<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>, std::__1::allocator<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>>, void ()>::operator()[abi:v160006]() function.h:185 (cpp-tests:arm64+0x101edf424)
    #11 std::__1::__function::__func<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>, std::__1::allocator<std::__1::__bind<void (ax::AudioEngineImpl::*)(ax::AudioCache*, int), ax::AudioEngineImpl*, ax::AudioCache*&, int&>>, void ()>::operator()() function.h:356 (cpp-tests:arm64+0x101edd6d4)
    #12 std::__1::__function::__value_func<void ()>::operator()[abi:v160006]() const function.h:510 (cpp-tests:arm64+0x1019d556c)
    #13 std::__1::function<void ()>::operator()() const function.h:1156 (cpp-tests:arm64+0x1019d3674)
    #14 ax::AudioCache::invokingPlayCallbacks() AudioCache.cpp:372 (cpp-tests:arm64+0x101e930d8)
    #15 ax::AudioCache::readDataTask(unsigned int) AudioCache.cpp:336 (cpp-tests:arm64+0x101e92d9c)
    #16 ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1::operator()() const AudioEngineImpl.cpp:516 (cpp-tests:arm64+0x101ed9e68)
    #17 decltype(std::declval<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1&>()()) std::__1::__invoke[abi:v160006]<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1&>(ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1&) invoke.h:394 (cpp-tests:arm64+0x101ed9d78)
    #18 void std::__1::__invoke_void_return_wrapper<void, true>::__call<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1&>(ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1&) invoke.h:487 (cpp-tests:arm64+0x101ed9cdc)
    #19 std::__1::__function::__alloc_func<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1, std::__1::allocator<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1>, void ()>::operator()[abi:v160006]() function.h:185 (cpp-tests:arm64+0x101ed9c88)
    #20 std::__1::__function::__func<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1, std::__1::allocator<ax::AudioEngineImpl::preload(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>)::$_1>, void ()>::operator()() function.h:356 (cpp-tests:arm64+0x101ed7b80)
    #21 std::__1::__function::__value_func<void ()>::operator()[abi:v160006]() const function.h:510 (cpp-tests:arm64+0x1019d556c)
    #22 std::__1::function<void ()>::operator()() const function.h:1156 (cpp-tests:arm64+0x1019d3674)
    #23 ax::AudioEngine::AudioEngineThreadPool::threadFunc() AudioEngine.cpp:126 (cpp-tests:arm64+0x101ea971c)
    #24 decltype(*std::declval<ax::AudioEngine::AudioEngineThreadPool*&>().*std::declval<void (ax::AudioEngine::AudioEngineThreadPool::*&)()>()()) std::__1::__invoke[abi:v160006]<void (ax::AudioEngine::AudioEngineThreadPool::*&)(), ax::AudioEngine::AudioEngineThreadPool*&, void>(void (ax::AudioEngine::AudioEngineThreadPool::*&)(), ax::AudioEngine::AudioEngineThreadPool*&) invoke.h:359 (cpp-tests:arm64+0x101eadb04)
    #25 std::__1::__bind_return<void (ax::AudioEngine::AudioEngineThreadPool::*)(), std::__1::tuple<ax::AudioEngine::AudioEngineThreadPool*>, std::__1::tuple<>, __is_valid_bind_return<void (ax::AudioEngine::AudioEngineThreadPool::*)(), std::__1::tuple<ax::AudioEngine::AudioEngineThreadPool*>, std::__1::tuple<>>::value>::type std::__1::__apply_functor[abi:v160006]<void (ax::AudioEngine::AudioEngineThreadPool::*)(), std::__1::tuple<ax::AudioEngine::AudioEngineThreadPool*>, 0ul, std::__1::tuple<>>(void (ax::AudioEngine::AudioEngineThreadPool::*&)(), std::__1::tuple<ax::AudioEngine::AudioEngineThreadPool*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) bind.h:263 (cpp-tests:arm64+0x101eada24)
    #26 std::__1::__bind_return<void (ax::AudioEngine::AudioEngineThreadPool::*)(), std::__1::tuple<ax::AudioEngine::AudioEngineThreadPool*>, std::__1::tuple<>, __is_valid_bind_return<void (ax::AudioEngine::AudioEngineThreadPool::*)(), std::__1::tuple<ax::AudioEngine::AudioEngineThreadPool*>, std::__1::tuple<>>::value>::type std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>::operator()[abi:v160006]<>() bind.h:295 (cpp-tests:arm64+0x101ead99c)
    #27 decltype(std::declval<std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>>()()) std::__1::__invoke[abi:v160006]<std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>>(std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>&&) invoke.h:394 (cpp-tests:arm64+0x101ead8f0)
    #28 void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>>&, std::__1::__tuple_indices<>) thread:288 (cpp-tests:arm64+0x101ead89c)
    #29 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, std::__1::__bind<void (ax::AudioEngine::AudioEngineThreadPool::*)(), ax::AudioEngine::AudioEngineThreadPool*>>>(void*) thread:299 (cpp-tests:arm64+0x101ead330)

  Thread T18 (tid=2140243, running) created by main thread at:
    #0 pthread_create <null>:264457860 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3062c)
    #1 std::__1::__libcpp_thread_create[abi:v160006](_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:378 (cpp-tests:arm64+0x100418f88)
    #2 std::__1::thread::thread<ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'(), void>(ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()&&) thread:315 (cpp-tests:arm64+0x101ee836c)
    #3 std::__1::thread::thread<ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'(), void>(ax::AsyncTaskPool::ThreadTasks::ThreadTasks()::'lambda'()&&) thread:307 (cpp-tests:arm64+0x101ee7d38)
    #4 ax::AsyncTaskPool::ThreadTasks::ThreadTasks() AsyncTaskPool.h:120 (cpp-tests:arm64+0x101ee7c2c)
    #5 ax::AsyncTaskPool::ThreadTasks::ThreadTasks() AsyncTaskPool.h:119 (cpp-tests:arm64+0x101ee7a9c)
    #6 ax::AsyncTaskPool::AsyncTaskPool() AsyncTaskPool.cpp:48 (cpp-tests:arm64+0x101ee79bc)
    #7 ax::AsyncTaskPool::AsyncTaskPool() AsyncTaskPool.cpp:48 (cpp-tests:arm64+0x101ee7874)
    #8 ax::AsyncTaskPool::getInstance() AsyncTaskPool.cpp:37 (cpp-tests:arm64+0x101ee7794)
    #9 void ax::FileUtils::performOperationOffthread<ax::FileUtils::isFileExist(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>) const::$_6, std::__1::function<void (bool)>>(ax::FileUtils::isFileExist(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>) const::$_6&&, std::__1::function<void (bool)>&&) FileUtils.h:902 (cpp-tests:arm64+0x10226e054)
    #10 ax::FileUtils::isFileExist(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::function<void (bool)>) const FileUtils.cpp:919 (cpp-tests:arm64+0x10226df64)
    #11 TestIsFileExistAsync::onEnter() FileUtilsTest.cpp:916 (cpp-tests:arm64+0x1004cb898)
    #12 ax::Director::setNextScene() Director.cpp:1158 (cpp-tests:arm64+0x101f53170)
    #13 ax::Director::drawScene() Director.cpp:302 (cpp-tests:arm64+0x101f528bc)
    #14 ax::Director::mainLoop() Director.cpp:1547 (cpp-tests:arm64+0x101f59da0)
    #15 ax::Application::run() Application-mac.mm:76 (cpp-tests:arm64+0x10236560c)
    #16 main main.cpp:32 (cpp-tests:arm64+0x1019a6ce4)

SUMMARY: ThreadSanitizer: data race vector:552 in std::__1::vector<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()>>>::empty[abi:v160006]() const
==================

@crazyhappygame
Copy link
Contributor Author

Not synchronizaed access to ````_actionsToPerform.empty()``` is data race.
Below few possible brocken behaviours

  1. _actionsToPerform.empty() will be always seen as empty even if elements were added to _actionsToPerform from separate thread
  2. _actionsToPerform.empty() will be seen as empty for next N iteration even if will be always seen as empty even is elements were added to _actionsToPerform from separate thread
    But thats "the best" what could happen. generelly "data race" == undefined behaviour for C++. Anything could happen ...

It is not the same but a bit similar. Please check why Double-Check Locking Pattern is broken:
https://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/double-checked-locking/
https://en.wikipedia.org/wiki/Double-checked_locking
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#cp110-do-not-write-your-own-double-checked-locking-for-initialization
https://www.youtube.com/watch?v=lVBvHbJsg5Y

@rh101
Copy link
Contributor

rh101 commented Mar 17, 2024

Firstly, ThreadSantizer can and does generate false positives.

std::vector::empty() does not affect the elements in the list, as it's simply reading an internal size value.

It does require a lock when adding or removing a new element to the list, as you pointed out.

If it happens to be updating the internal size variable that is read by empty(), that should still not result in a data race condition, as it doesn't matter which order it happens. For example:

Scenario 1:
Thread1 - lock
Thread1 - emplace
Thread1 - size updated
Thread2 - empty() called, size read

or

Scenario 2:
Thread1 - lock
Thread1 - emplace
Thread2 - empty() called, size read
Thread1 - size updated

In the specific use case here, it won't matter what the result of empty() is, whether it's true or false, since all that ever happens if it's true is that it locks, and moves the data.

If it is Scenario 2, where it happens to check empty() in the middle of the update, that's fine, because on the next scheduler loop it'll process the list anyway.

Again, we're only referring to this use case.

@crazyhappygame
Copy link
Contributor Author

@rh101 Thank you for anwering. I really think that we should be on the same page here
So far I think we fully agree with below

ThreadSantizer can and does generate false positives

But I think it is not a case in this case. I think that thread sanitizer found real data race here

To move forward here I think what we should agree on the following points

  1. Is this case is data race
  2. Is this acceptable to have data race in c++ code
  3. how to fix this data race

Can we start with point 1 Is this case is data race?

To make discussion more effective I propose is to continue discussion on below simplified example.

#include <thread>
#include <iostream>

std::mutex mutex;
bool b = false;
int main(){
    std::cout << b << "\n";
    std::thread t([](){
        std::lock_guard<std::mutex> lock(mutex);
        // write
        b=true;
        });
    // read
    std::cout << b << "\n";
    t.join();
}

Question: is code above has data race?

  1. I think that that this code has data race.
  2. thread sanitizer confirm that this code has data race
    CMakeLists.txt
add_compile_options(-fsanitize=thread)
add_link_options(-fsanitize=thread)
add_executable(${PROJECT_NAME} main.cpp)

output

==================
WARNING: ThreadSanitizer: data race (pid=73566)
  Write of size 1 at 0x000100154040 by thread T1 (mutexes: write M0):
    #0 main::$_0::operator()() const main.cpp:10 (thread:arm64+0x10000585c)
    #1 decltype(std::declval<main::$_0>()()) std::__1::__invoke[abi:v160006]<main::$_0>(main::$_0&&) invoke.h:394 (thread:arm64+0x1000057a0)
    #2 void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, main::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, main::$_0>&, std::__1::__tuple_indices<>) thread:288 (thread:arm64+0x1000056cc)
    #3 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, main::$_0>>(void*) thread:299 (thread:arm64+0x100004ac8)

  Previous read of size 1 at 0x000100154040 by main thread:
    #0 main main.cpp:12 (thread:arm64+0x1000031fc)

  Location is global 'b' at 0x000100154040 (thread+0x10000c040)

  Mutex M0 (0x000100154000) created at:
    #0 pthread_mutex_lock <null>:9686660 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3cf8c)
    #1 std::__1::mutex::lock() <null>:9686660 (libc++.1.dylib:arm64e+0x16710)
    #2 std::__1::lock_guard<std::__1::mutex>::lock_guard[abi:v160006](std::__1::mutex&) __mutex_base:94 (thread:arm64+0x1000058d4)
    #3 main::$_0::operator()() const main.cpp:9 (thread:arm64+0x100005844)
    #4 decltype(std::declval<main::$_0>()()) std::__1::__invoke[abi:v160006]<main::$_0>(main::$_0&&) invoke.h:394 (thread:arm64+0x1000057a0)
    #5 void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, main::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, main::$_0>&, std::__1::__tuple_indices<>) thread:288 (thread:arm64+0x1000056cc)
    #6 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, main::$_0>>(void*) thread:299 (thread:arm64+0x100004ac8)

  Thread T1 (tid=3292718, running) created by main thread at:
    #0 pthread_create <null>:9686660 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3062c)
    #1 std::__1::__libcpp_thread_create[abi:v160006](_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:378 (thread:arm64+0x100004a30)
    #2 std::__1::thread::thread<main::$_0, void>(main::$_0&&) thread:315 (thread:arm64+0x1000047d4)
    #3 std::__1::thread::thread<main::$_0, void>(main::$_0&&) thread:307 (thread:arm64+0x100003370)
    #4 main main.cpp:8 (thread:arm64+0x1000031e4)

SUMMARY: ThreadSanitizer: data race main.cpp:10 in main::$_0::operator()() const
==================

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

data race not always trigger problem. when read thread read memory never invalid by write thread, there no problem, in schedule case, schedule will check every frame, so never miss actionsTo perform.

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

the empty check pointer value or member of std::vector it self is always vaild before vector destructed.

@crazyhappygame
Copy link
Contributor Author

@halx99 Thank you for joining discussion.
I think we already moving discussion to point 2
"data race not always trigger problem" ~ "2. Is this acceptable to have data race in c++ code"

Before we will move to point 2.
@halx99 , @rh101 Could you confirm that we all agree this is "data race"?

@rh101
Copy link
Contributor

rh101 commented Mar 17, 2024

@halx99 , @rh101 Could you confirm that we all agree this is "data race"?

In this implementation, it does not matter when the size of the vector is checked (via empty()), as the resulting behaviour will still be valid, unless I'm totally mistaken here.

I get that the thread sanitizer is complaining, but it doesn't necessarily mean the implementation is incorrect.

Will this data race cause a crash, or have you actually experienced a crash due to this?

@crazyhappygame
Copy link
Contributor Author

@rh101 @rh101 I will assume that we all agree that we have "data race" :)

Let's move discussion to point 2.

Could check links from #1753 (comment)
or
https://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

I would like to be sure that we all understand code below is brocken (Double-Check Locking Pattern (DCLP) - broken implementation). And what side effect we can observe here (memory leak, not fully initialized object, wrong memory access, anything else)

#include <iostream>
#include <mutex>
#include <thread>

class Singleton {
public:
    static Singleton* getInstance() {
        if (instance == nullptr) { // First check (problematic without memory barriers)
            std::lock_guard<std::mutex> lock(mutex);
            if (instance == nullptr) { // Second check
                instance = new Singleton(); // Potential reordering and visibility issues
            }
        }
        return instance;
    }

private:
    Singleton() {}
    static Singleton* instance;
    static std::mutex mutex;
};

// Static member initialization
Singleton* Singleton::instance = nullptr;
std::mutex Singleton::mutex;

void useSingleton() {
    Singleton* singletonInstance = Singleton::getInstance();
    std::cout << "Using singleton instance: " << singletonInstance << std::endl;
}

int main() {
    std::thread t1(useSingleton);
    std::thread t2(useSingleton);

    t1.join();
    t2.join();

    return 0;
}

Question:
Do we all agree that above implementation of Double-Check Locking Pattern (DCLP) is broken? It can lead to memory leak, access to not fully initialized object, wrong memory access, itp.

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

agree, but it's very spec case for singleton not related to schedule actionsToPerform case.

@halx99 halx99 added the question Further information is requested label Mar 17, 2024
@rh101
Copy link
Contributor

rh101 commented Mar 17, 2024

Just an FYI, this section of code has not changed since Cocos2d-x, as you can see here: https://github.com/cocos2d/cocos2d-x/blob/76903dee64046c7bfdba50790be283484b4be271/cocos/base/CCScheduler.cpp#L944

@crazyhappygame
Copy link
Contributor Author

agree, but it's very spec case for singleton not related to schedule actionsToPerform case.

There are specific things for singleton (e.g. memory allocation, memory. leak) but there is many in common data race, undefined behaviour

@crazyhappygame
Copy link
Contributor Author

Just an FYI, this section of code has not changed since Cocos2d-x, as you can see here: https://github.com/cocos2d/cocos2d-x/blob/76903dee64046c7bfdba50790be283484b4be271/cocos/base/CCScheduler.cpp#L944

I checked that as well. 11 years old :)
What was changed over time:

  1. Number of cores in devices. e.g.: first android device has only one core - now many cores (mine has 8)
  2. Number of threads in engine - axmol uses more threads then cosos2dx 11 years ago

@crazyhappygame
Copy link
Contributor Author

crazyhappygame commented Mar 17, 2024

OK. We agreed that we have "data race".
Let's try to agree if it is good to have "data race" in the code.
2. Is this acceptable to have data race in c++ code
Please check: https://en.cppreference.com/w/cpp/language/ub
For C++ "data race" => "undefined behavior" => "there are no restrictions on the behavior of the program." , "the compiled program is not required to do anything meaningful."

I think it is not acceptable to have in the app "undefined behavior".
Trying to reason about that how code really works with "undefined behavior" is useless. This may vary depending on compiler, system hardware etc ...

I think that we should fix all "data race" in the code. Let me create another fix where lock is not always called

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

as I and @rh101 said: there no problem double check for schedule actionsToPerform vector, so nothing need to fix. I can confirm any standard c++ compilers should't cause error behavior on it.

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

santizer just a tool to for suggestions, we don't need always follow it.

@halx99
Copy link
Collaborator

halx99 commented Mar 17, 2024

lock is expensive resources, that's why coocos2dx original developer use double check to avoid lock every frame i think.

@crazyhappygame
Copy link
Contributor Author

@halx99

  1. Double-Check Locking Pattern (DCLP) is broken
  2. Agree lock could be expensive. Let me create another fix where lock is not always called
  3. (It is useless to talk about "undefined behavior" but) will you agree with me that one of below behaviour is possible
    3.1. _actionsToPerform.empty() visibility of _actionsToPerform can be delayed, _actionsToPerform.empty() can true for multiple iteration even if in fact _actionsToPerform.emplace was called - due to lack on synchronization. (Threads running on different cores, different caches etc.)
    3.2. _actionsToPerform.empty() will be never false. Compiler/hardware may generate code that will make changes done in one thread visible in antoher thread

@crazyhappygame crazyhappygame mentioned this pull request Mar 17, 2024
6 tasks
@crazyhappygame
Copy link
Contributor Author

@halx99 Please check another fix: #1758. This fix has use atomic. It should have similar performance like before

@crazyhappygame
Copy link
Contributor Author

crazyhappygame commented Mar 18, 2024

@halx99 @rh101

Do we agree that we should fix this data race?
Can we move discussion to point 3?
3. how to fix this data race

Why is it Unsafe?
Concurrent Modifications and Reads: If a thread is modifying the vector (e.g., adding or removing elements), the internal state of the vector, including its size, may be in an inconsistent state during the modification. Accessing .size() during this time could lead to incorrect values or even corruption.

Memory Consistency: Even if a thread has just finished adding an element to the vector and another thread calls .size(), without proper synchronization, there is no guarantee that the second thread will see the most recent size due to caching and memory reordering optimizations performed by the compiler or CPU.

@rh101
Copy link
Contributor

rh101 commented Mar 18, 2024

Why is it Unsafe?
Concurrent Modifications and Reads: If a thread is modifying the vector (e.g., adding or removing elements), the internal state of the vector, including its size, may be in an inconsistent state during the modification. Accessing .size() during this time could lead to incorrect values or even corruption.

What do you mean by "Accessing .size()". It's being read, so would that cause corruption?

Memory Consistency: Even if a thread has just finished adding an element to the vector and another thread calls .size(), without proper synchronization, there is no guarantee that the second thread will see the most recent size due to caching and memory reordering optimizations performed by the compiler or CPU.

This does not matter, as has been stated earlier. If during the current scheduler update it happens to get empty() == true, even though it may not be seeing the most up-to-date state, that is still fine, since on the next update it will detect that the vector is no longer empty. This is expected behavior as far as I can tell.

I can understand you wanting to blame this section of code for the issues you're seeing, but, there is no reproducible test case to prove that this change is required, or that it will even fix the issue that you're having.

Are you able to make the changes to your own Axmol source and release your projects with those changes? If the crashes disappear, then that would be great, since it may help to prove that something does need to be done about this section of the scheduler code.

@crazyhappygame
Copy link
Contributor Author

crazyhappygame commented Mar 19, 2024

@rh101

  1. please send a links or any other prove that reading data in one thread and modify data in another is completely safe. Or any other variation like: it is ok to have data race on read nothing bad can happen, do not sync std::vector::empty calls in multithreading etc
  2. Please check a links I posted in this thread or at least please check: https://en.cppreference.com/w/cpp/language/ub
    For C++ "data race" => "undefined behavior" => "there are no restrictions on the behavior of the program." , "the compiled program is not required to do anything meaningful."

Once again if we are in undefined behavior zone anything could happen….
Would could happen here every 1000000 scheduler iteration:

  1. Scheduler will stop seeing at all updates to vector and after another 1000000 iteration it will be out of memory
  2. Std::vector::empty will read Not accessible memory - crash
  3. Compiler may generate code in that way and hardware can reorder instructions in that way that when reading element from vector garbage is accessed

@rh101
Copy link
Contributor

rh101 commented Mar 19, 2024

I am aware of what undefined behavior can lead to, but we would need to consider the case of this implementation and the effects it will have.

Once again if we are in undefined behavior zone anything could happen…. Would could happen here every 1000000 scheduler iteration:

1. Scheduler will stop seeing at all updates to vector and after another 1000000 iteration it will be out of memory

Why would it stop seeing all updates to the vector? Please explain how that would happen, because reviewing the implementation, I can't see how it would lead to that.

2. Std::vector::empty will read Not accessible memory - crash

If another thread happens to be writing to memory that is being read by std::vector::empty(), then, I assume, empty() may end up reading partially incomplete writes to a memory location. It should never be accessing a memory pointer that can change.

In the case of the implementation on Windows, empty() checks if he pointer to the beginning of the array is the same as the pointer to the end of the sequence, and if they match, then it's empty. If a write happens to be updating the end of sequence pointer, then it would mean a call to empty() would still result in the pointer to start and pointer to end not being equal (even if they're both partially written to), and the result is empty() = false. It's a valid result.

So, my question is, what could possible cause a non-accessible memory crash in the use of std::vector::empty()?

3. Compiler may generate code in that way and hardware can reorder instructions in that way that when reading element from vector garbage is accessed

That has nothing to do with this implementation. std::vector::empty() does not access vector elements at all, so I'm not sure how this would be an issue in this case. Elements are only accessed under a lock in this implementation.

@crazyhappygame
Copy link
Contributor Author

@rh101 lets focus on point 1 visibility
modern CPUs utilize caches at various levels (L1, L2, L3) to improve performance. Subsequent reads to the same location can be served directly from the cache, which is much faster than accessing main memory (RAM). Changes made by one thread (and hence one CPU core) to a shared variable may remain in its local cache and not visible to other threads running on different cores. This is because, for performance reasons, writes by one core are not propagated to the main memory or the caches of other cores without proper synchronization.

For more details please check e.g Herb Suttrr atomic weapons
https://youtu.be/A8eCGOqgvH4?si=_BCd4gQXmJ85I1ML

@rh101
Copy link
Contributor

rh101 commented Mar 20, 2024

Changes made by one thread (and hence one CPU core) to a shared variable may remain in its local cache and not visible to other threads running on different cores. This is because, for performance reasons, writes by one core are not propagated to the main memory or the caches of other cores without proper synchronization.

I don't know enough about this to comment on it. In the case of the std::vector::empty() implementation, it would be the start of array and end sequence pointers that would be accessed, and I can only speculate how caching works for this.

There is no point going back and forth about this. I'm not trying to dismiss your concerns, or that I'm personally against any changes, but what concerns me is that there is no proof this is required, no test case to show the problem, and the fact that the changes proposed so far will have a performance impact. This kind of change can't just made based on a guess that this is an issue, but rather there must be a way to provide proof that it is.

@mariusz102102
Copy link

Performances impact - we should always measure.
Please check below with atomic.
#1758
Atomic read on x86 is fast as non atomic read.
Atomic read on arm is almost as fast non atomic read and faster then non atomic write

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants