-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race in Scheduler.cpp #1753
Conversation
@halx99 could you merge it? |
Hold up, there is no detail regarding change made, so can you please explain the modification? From what I can see, this:
is changed to this:
This is now forcing a lock on each |
agree, there no difference with original implemention |
@halx99 @rh101
Read without lock
"data race" == undefined behaviour in C++. Please check thread sanitizer issue from the below error from #1752.
|
Not synchronizaed access to ````_actionsToPerform.empty()``` is data race.
It is not the same but a bit similar. Please check why Double-Check Locking Pattern is broken: |
Firstly, ThreadSantizer can and does generate false positives.
It does require a lock when adding or removing a new element to the list, as you pointed out. If it happens to be updating the internal size variable that is read by Scenario 1: or Scenario 2: In the specific use case here, it won't matter what the result of empty() is, whether it's true or false, since all that ever happens if it's true is that it locks, and moves the data. If it is Scenario 2, where it happens to check empty() in the middle of the update, that's fine, because on the next scheduler loop it'll process the list anyway. Again, we're only referring to this use case. |
@rh101 Thank you for anwering. I really think that we should be on the same page here
But I think it is not a case in this case. I think that thread sanitizer found real data race here To move forward here I think what we should agree on the following points
Can we start with point 1 Is this case is data race? To make discussion more effective I propose is to continue discussion on below simplified example.
Question: is code above has data race?
output
|
data race not always trigger problem. when read thread read memory never invalid by write thread, there no problem, in schedule case, schedule will check every frame, so never miss actionsTo perform. |
the empty check pointer value or member of std::vector it self is always vaild before vector destructed. |
In this implementation, it does not matter when the size of the vector is checked (via I get that the thread sanitizer is complaining, but it doesn't necessarily mean the implementation is incorrect. Will this data race cause a crash, or have you actually experienced a crash due to this? |
@rh101 @rh101 I will assume that we all agree that we have "data race" :) Let's move discussion to point 2. Could check links from #1753 (comment) I would like to be sure that we all understand code below is brocken (Double-Check Locking Pattern (DCLP) - broken implementation). And what side effect we can observe here (memory leak, not fully initialized object, wrong memory access, anything else)
Question: |
agree, but it's very spec case for singleton not related to schedule actionsToPerform case. |
Just an FYI, this section of code has not changed since Cocos2d-x, as you can see here: https://github.com/cocos2d/cocos2d-x/blob/76903dee64046c7bfdba50790be283484b4be271/cocos/base/CCScheduler.cpp#L944 |
There are specific things for singleton (e.g. memory allocation, memory. leak) but there is many in common data race, undefined behaviour |
I checked that as well. 11 years old :)
|
OK. We agreed that we have "data race". I think it is not acceptable to have in the app "undefined behavior". I think that we should fix all "data race" in the code. Let me create another fix where lock is not always called |
as I and @rh101 said: there no problem double check for schedule actionsToPerform vector, so nothing need to fix. I can confirm any standard c++ compilers should't cause error behavior on it. |
santizer just a tool to for suggestions, we don't need always follow it. |
lock is expensive resources, that's why coocos2dx original developer use double check to avoid lock every frame i think. |
|
Do we agree that we should fix this data race? Why is it Unsafe? Memory Consistency: Even if a thread has just finished adding an element to the vector and another thread calls .size(), without proper synchronization, there is no guarantee that the second thread will see the most recent size due to caching and memory reordering optimizations performed by the compiler or CPU. |
What do you mean by "Accessing .size()". It's being read, so would that cause corruption?
This does not matter, as has been stated earlier. If during the current scheduler update it happens to get I can understand you wanting to blame this section of code for the issues you're seeing, but, there is no reproducible test case to prove that this change is required, or that it will even fix the issue that you're having. Are you able to make the changes to your own Axmol source and release your projects with those changes? If the crashes disappear, then that would be great, since it may help to prove that something does need to be done about this section of the scheduler code. |
Once again if we are in undefined behavior zone anything could happen….
|
I am aware of what undefined behavior can lead to, but we would need to consider the case of this implementation and the effects it will have.
Why would it stop seeing all updates to the vector? Please explain how that would happen, because reviewing the implementation, I can't see how it would lead to that.
If another thread happens to be writing to memory that is being read by In the case of the implementation on Windows, So, my question is, what could possible cause a non-accessible memory crash in the use of
That has nothing to do with this implementation. |
@rh101 lets focus on point 1 visibility For more details please check e.g Herb Suttrr atomic weapons |
I don't know enough about this to comment on it. In the case of the There is no point going back and forth about this. I'm not trying to dismiss your concerns, or that I'm personally against any changes, but what concerns me is that there is no proof this is required, no test case to show the problem, and the fact that the changes proposed so far will have a performance impact. This kind of change can't just made based on a guess that this is an issue, but rather there must be a way to provide proof that it is. |
Performances impact - we should always measure. |
Describe your changes
Fix one of the many thread races found with thread sanitizer
Issue ticket number and link
: #1752
Checklist before requesting a review
For each PR
Add Copyright if it missed:
-
"Copyright (c) 2019-present Axmol Engine contributors (see AUTHORS.md)."
I have performed a self-review of my code.
Optional:
For core/new feature PR