-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify the behavior of @threads for
#44168
Conversation
Co-authored-by: Ian Butterworth <[email protected]>
base/threadingconstructs.jl
Outdated
|
||
See also: [`@spawn`](@ref Threads.@spawn), | ||
`pmap` in [`Distributed`](@ref man-distributed), and | ||
`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the reference to set_num_threads is helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought it was weird but I assumed someone wanted to add it. But looking at "git blame", it was added in #38606 which was a rather very broad "catch-all" patch. So I think dropping here makes sense.
@mcabbott Do you have some particular reasons to keep it?
One way to make it relevant is to link to @carstenbauer's https://carstenbauer.github.io/ThreadPinning.jl/dev/explanations/blas/ or add something similar to the manual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong feelings. The intention was to point out that what's inside your loop may already be multi-threaded, and that you may want to disable that. But perhaps just a link doesn't help much unless you already know? Something like a manual section might be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkf I think that something along the lines of what I wrote there should, in some form, go into the manual. AFAIK, this is also @ViralBShah's opinion (see here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I opened issue #44201 for tracking this. We can add a link once we add a manual section.
Reading this, I am again wondering whether @tkf your So thinking ahead:
|
I prefer to focus on documentation in this PR, although I tried to structure it such that adding a new scheduler is easy. I'm not planning to do it myself, but I think we can make |
iteration space. Thus, `@threads :dynamic for x in xs; f(x); end` is typically more | ||
efficient than `@sync for x in xs; @spawn f(x); end` if `length(xs)` is significantly | ||
larger than the number of the worker threads and the run-time of `f(x)` is relatively | ||
smaller than the cost of spawning and synchronizaing a task (typically less than 10 | ||
microseconds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"typically less than 10 microseconds" is from
julia> Threads.nthreads()
2
julia> @benchmark wait(Threads.@spawn nothing)
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
Range (min … max): 1.455 μs … 53.389 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.047 μs ┊ GC (median): 0.00%
Time (mean ± σ): 9.004 μs ± 1.242 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
██▁ ▁▄▄▂
▂▁▁▂▁▁▁▁▂▁▂▁▁▁▂▁▂▂▁▁▁▂▂▂▂▂▂▂▂▂▃▃▄▃▆▆▄▆███▆▅▆█████▆▅▄▄▅▅▅▄▃ ▃
1.46 μs Histogram: frequency by time 11.5 μs <
Memory estimate: 480 bytes, allocs estimate: 5.
I'm looking at the median (and the mean, which is not very different from the median) as a "typical" case. It's more useful than min since the multi-queue is a random and concurrent algorithm.
This is on Linux. Can someone check it with other OSes? Note that it's important to use nthreads() > 1
since nthreads() == 1
is special-cased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the second part of the statement be reversed?
and the run-time of
f(x)
is relativelysmallerlarger than the cost of spawning and synchronizing a task (typically less than 10 microseconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's correct as it is. If the cost of f(x)
is larger, then the cost of @spawn
would be negligible, so you won't see @threads :dynamic for
being more efficient.
Bump, can this be merged? |
LGTM |
* Clarify the behavior of `@threads for` Co-authored-by: Ian Butterworth <[email protected]> (cherry picked from commit 2f67b51)
* Clarify the behavior of `@threads for` Co-authored-by: Ian Butterworth <[email protected]> (cherry picked from commit 2f67b51)
The introduction of the new scheduler for
@threads
#43919 triggered some discussions::dynamic
scheduling option forThreads.@threads
#43919 (comment)which made me realize that the current docstring does not explain what it does clearly and accurately. So, I tried to clarify the docstring and also define the properties of the
:dynamic
scheduler a bit more to help users understand when to use it. Along the way, I also re-organized the docstring to hopefully make it easy to navigate.ping @carstenbauer @Moelf