-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distributed] Ref counter thread safety #42167
[Distributed] Ref counter thread safety #42167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In isolation this changes look good, but as you mentioned might not suffice.
CI appears happy modulo FreeBSD; FreeBSD has been failing regularly lately so chances are it's negligible. (I wasn't sure what to make of the logs from a brief glance, but restarted for more data.) |
Remaining bugs from the original issue (JuliaParallel/Dagger.jl#267) are addressed here JuliaParallel/Dagger.jl#281 and were happening due to Dagger code |
This should be safe and trivial to backport; maybe it can make it into 1.7? |
It's preferable with commit messages that are a bit more descriptive than "fix". |
@KristofferC Should I force push a commit name change on this branch now that it's merged? Also sorry for the inconvenience |
No, just something to keep in mind for the future. |
It is also more that vchuravy should have squashed this when merging and ensured the correctness of the message. |
Mea culpa. |
In various Dagger use cases, especially in an env with threads, issues have been observed where Future's were being set more than once.
Stacktraces, MWE and some investigation here JuliaParallel/Dagger.jl#267
So overall the suspicion was that Futures are being created with the same ids and are effectively being set twice when tasks are completed.
With this fix applied this is no longer happening as the refcounter is now threadsafe
However, we are still observing hangs and some other Dagger related issues in the same examples that were causing this future set twice issue, so we still need to investigate what is causing these.
Example stacktrace
@jpsamaroo