Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Distributed] Ref counter thread safety #42167

Merged
merged 1 commit into from
Sep 12, 2021
Merged

[Distributed] Ref counter thread safety #42167

merged 1 commit into from
Sep 12, 2021

Conversation

krynju
Copy link
Contributor

@krynju krynju commented Sep 8, 2021

In various Dagger use cases, especially in an env with threads, issues have been observed where Future's were being set more than once.
Stacktraces, MWE and some investigation here JuliaParallel/Dagger.jl#267

So overall the suspicion was that Futures are being created with the same ids and are effectively being set twice when tasks are completed.
With this fix applied this is no longer happening as the refcounter is now threadsafe

However, we are still observing hangs and some other Dagger related issues in the same examples that were causing this future set twice issue, so we still need to investigate what is causing these.

Example stacktrace

Error in eager listener:
Future can be set only once
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] put_future(rid::Distributed.RRID, v::MemPool.DRef, caller::Int64)
   @ Distributed C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:590
 [3] call_on_owner(::Function, ::Distributed.Future, ::MemPool.DRef, ::Vararg{Any})
   @ Distributed C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:514
 [4] put!(rr::Distributed.Future, v::MemPool.DRef)
   @ Distributed C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\Distributed\src\remotecall.jl:584
 [5] eager_thunk()
   @ Dagger.Sch c:\Users\krynjupc\.julia\dev\Dagger\src\sch\eager.jl:84
 [6] macro expansion
   @ c:\Users\krynjupc\.julia\dev\Dagger\src\processor.jl:154 [inlined]
 [7] (::Dagger.var"#53#54"{typeof(Dagger.Sch.eager_thunk), Tuple{}, NamedTuple{(:sch_uid, :sch_handle, :processor, :utilization), Tuple{UInt64, Dagger.Sch.SchedulerHandle, Dagger.ThreadProc, UInt64}}})()
   @ Dagger .\threadingconstructs.jl:178

@jpsamaroo

@Sacha0 Sacha0 added the bugfix This change fixes an existing bug label Sep 8, 2021
Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In isolation this changes look good, but as you mentioned might not suffice.

@Sacha0
Copy link
Member

Sacha0 commented Sep 9, 2021

CI appears happy modulo FreeBSD; FreeBSD has been failing regularly lately so chances are it's negligible. (I wasn't sure what to make of the logs from a brief glance, but restarted for more data.)

@krynju
Copy link
Contributor Author

krynju commented Sep 12, 2021

Remaining bugs from the original issue (JuliaParallel/Dagger.jl#267) are addressed here JuliaParallel/Dagger.jl#281 and were happening due to Dagger code

@jpsamaroo
Copy link
Member

This should be safe and trivial to backport; maybe it can make it into 1.7?

@vchuravy vchuravy merged commit 4805d54 into JuliaLang:master Sep 12, 2021
@KristofferC
Copy link
Member

It's preferable with commit messages that are a bit more descriptive than "fix".

@krynju
Copy link
Contributor Author

krynju commented Sep 15, 2021

@KristofferC Should I force push a commit name change on this branch now that it's merged?

Also sorry for the inconvenience

@KristofferC
Copy link
Member

Should I force push a commit name change on this branch now that it's merged?

No, just something to keep in mind for the future.

@vtjnash
Copy link
Member

vtjnash commented Sep 15, 2021

It is also more that vchuravy should have squashed this when merging and ensured the correctness of the message.

@vchuravy
Copy link
Member

Mea culpa.

@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants