-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Ref assignment atomic #46302
Make Ref assignment atomic #46302
Conversation
Question for @RandomShaper since I think you were working on some of the other atomics in the engine: Should we use a different memory order than the default? I don't understand the instructions that this code gets compiled down to, but it seems like the more relaxed memory orders might be faster. https://www.cplusplus.com/reference/atomic/memory_order/ Obviously I haven't profiled it, but I don't personally care what order assignments happen. If two threads write to the same Ref without a lock, the ordering should be undefined. I don't see any benefit to a stricter guarantee than promising that we'll reference count correctly and that the value of |
Related: #35046 |
I think you don't need to make the local pointers atomic. Just by having an atomic member pointer, you can get and set its value into (and from) the local pointers safely. Regarding memory order, I'd need more time to reason about all the possibilities, but I'm quite confident that relaxed is fine, given the kind of guarantee you want to add with this PR, which it not full thread safety (that, as you said, would likely require some locking mechanism, or maybe one super clever lock-less algorithm). Instead of adding multiple review comments, I've applied my changes locally (on top of yours) and created this patch file: |
8c58381
to
bdb61bb
Compare
Patch looks good. CI is breaking but I think it's ready for another look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. However, before merging maybe it'd be good to add a test that exercises this. Or maybe it can be done later (I may be able to add the test myself, but not any sooner than the weekend).
I don't have much experience with tests, and I think they're failing on master right now so I'm not sure how to work around that. |
I'll try do add a test myself today. CI seems fine. I guess this can be merged and, in case issues arise or the test fails, you can just make a fixup PR. Working on master has advantages... 😀 |
@RandomShaper mentions it is possible to ensure that barrier won't happen at compile time, so that might work. I was not aware of this C++11 feature. |
@ellenhp, after discussing this with @reduz on IRC, we've agreed that, in order to know if some future CPU or C++ library needs to lock to achieve atomicity (so we know we must use a different approach if that happens), adding this inside the body of the class is desired: That guarantee, together with the relaxed ordering, gives us enough peace of mind. |
I'll make this change later tonight and confirm it's working by going through the generated assembly for some toy examples. I do think that a read-modify-write operation like |
bdb61bb
to
8c53652
Compare
It looks like there is a memory barrier on x86_64 (thank you @fire for dumping this assembly for me 👍)
I'm not the best at looking at assembly, but I think
So, bad news. But again, we're already doing this for reference counts and I'm not aware of any performance issues with that. I can profile this too if needed :) Just let me know. |
When I said "no barriers" I meant no compiler barriers and no enforcement of CPU barriers. However, that's a minimum requirement from the programmer and the CPU still has to do what it has to do; i.e. do the closest equal-or-safer thing it can do to at least meet the expectations (like if you explicitly issue some kind of barrier not considered by some CPU instruction set, it may have to use a full barrier to at least provide the expected guarantees). In our case, yes, In conclusion, the barrier is a must on x86 (64) and I guess we can live with it. I don't think it will hinder performance so much that it becomes a problem and, anyway, this is in harmony to our current leaning towards a thread safer Godot. P. S.: At least we can ensure the operation is lock-free, which would be worse than the memory barrier. |
Agreed. I just wanted to bring this up because it theoretically could impact performance. I don't think it will in a serious way but it would be dishonest to not mention it. :) |
We should most likely measure this somehow before merging. IMO this protects against the same reference being accesed from different threads, which is not exactly what the refcount was meant for, and not a common case along the engine. Maybe this could be special case where it is an actual problema. |
This PR ensures that references are correctly counted even when two threads assign to the same Ref<> variable at the same time. There's a nonzero performance cost to using atomics here, but we're already using atomics in
SafeRefCount
so I think it's probably fine. I need correct behavior in Ref<> for the implementation godotengine/godot-proposals#2299 See godotengine/godot-proposals#2330 for justification for this change.I can't seem to get ASAN to work on macOS so I'd really appreciate some more thorough testing. I'll update this if I find a way to test for leaks and memory errors.
Please review this as if I'm just some random person on the internet who barely knows C++ modifying one of the most fundamental data structures in the entire engine, because that is 100% accurate. 🤣
Fixes: godotengine/godot-proposals#2330