From 2f67b51c70280cf7f0f2da5de2e7769da0d49869 Mon Sep 17 00:00:00 2001 From: Takafumi Arakaki Date: Thu, 3 Mar 2022 08:45:49 -0800 Subject: [PATCH] Clarify the behavior of `@threads for` (#44168) * Clarify the behavior of `@threads for` Co-authored-by: Ian Butterworth --- base/threadingconstructs.jl | 92 +++++++++++++++++++++++++------------ 1 file changed, 62 insertions(+), 30 deletions(-) diff --git a/base/threadingconstructs.jl b/base/threadingconstructs.jl index 9ed416caec2a6..a3413701fb7de 100644 --- a/base/threadingconstructs.jl +++ b/base/threadingconstructs.jl @@ -99,46 +99,82 @@ end """ Threads.@threads [schedule] for ... end -A macro to parallelize a `for` loop to run with multiple threads. Splits the iteration -space among multiple tasks and runs those tasks on threads according to a scheduling -policy. -A barrier is placed at the end of the loop which waits for all tasks to finish -execution. - -The `schedule` argument can be used to request a particular scheduling policy. - -Except for `:static` scheduling, how the iterations are assigned to tasks, and how the tasks -are assigned to the worker threads is undefined. The exact assignments can be different -for each execution. The scheduling option is a hint. The loop body code (including any code -transitively called from it) must not make assumptions about the distribution of iterations -to tasks or the worker thread in which they are executed. The loop body for each iteration -must be able to make forward progress independent of other iterations and be free from data -races. As such, synchronizations across iterations may deadlock. +A macro to execute a `for` loop in parallel. The iteration space is distributed to +coarse-grained tasks. This policy can be specified by the `schedule` argument. The +execution of the loop waits for the evaluation of all iterations. + +See also: [`@spawn`](@ref Threads.@spawn) and +`pmap` in [`Distributed`](@ref man-distributed). + +# Extended help + +## Semantics + +Unless stronger guarantees are specified by the scheduling option, the loop executed by +`@threads` macro have the following semantics. + +The `@threads` macro executes the loop body in an unspecified order and potentially +concurrently. It does not specify the exact assignments of the tasks and the worker threads. +The assignments can be different for each execution. The loop body code (including any code +transitively called from it) must not make any assumptions about the distribution of +iterations to tasks or the worker thread in which they are executed. The loop body for each +iteration must be able to make forward progress independent of other iterations and be free +from data races. As such, invalid synchronizations across iterations may deadlock while +unsynchronized memory accesses may result in undefined behavior. For example, the above conditions imply that: - The lock taken in an iteration *must* be released within the same iteration. - Communicating between iterations using blocking primitives like `Channel`s is incorrect. -- Write only to locations not shared across iterations (unless a lock or atomic operation is used). +- Write only to locations not shared across iterations (unless a lock or atomic operation is + used). +- The value of [`threadid()`](@ref Threads.threadid) may change even within a single + iteration. -Schedule options are: -- `:dynamic` (default) will schedule iterations dynamically to available worker threads, - assuming that the workload for each iteration is uniform. -- `:static` creates one task per thread and divides the iterations equally among - them, assigning each task specifically to each thread. - Specifying `:static` is an error if used from inside another `@threads` loop - or from a thread other than 1. +## Schedulers -Without the scheduler argument, the exact scheduling is unspecified and varies across Julia releases. +Without the scheduler argument, the exact scheduling is unspecified and varies across Julia +releases. Currently, `:dynamic` is used when the scheduler is not specified. !!! compat "Julia 1.5" The `schedule` argument is available as of Julia 1.5. +### `:dynamic` (default) + +`:dynamic` scheduler executes iterations dynamically to available worker threads. Current +implementation assumes that the workload for each iteration is uniform. However, this +assumption may be removed in the future. + +This scheduling option is merely a hint to the underlying execution mechanism. However, a +few properties can be expected. The number of `Task`s used by `:dynamic` scheduler is +bounded by a small constant multiple of the number of available worker threads +([`nthreads()`](@ref Threads.nthreads)). Each task processes contiguous regions of the +iteration space. Thus, `@threads :dynamic for x in xs; f(x); end` is typically more +efficient than `@sync for x in xs; @spawn f(x); end` if `length(xs)` is significantly +larger than the number of the worker threads and the run-time of `f(x)` is relatively +smaller than the cost of spawning and synchronizaing a task (typically less than 10 +microseconds). + !!! compat "Julia 1.8" The `:dynamic` option for the `schedule` argument is available and the default as of Julia 1.8. -For example, an illustration of the different scheduling strategies where `busywait` -is a non-yielding timed loop that runs for a number of seconds. +### `:static` + +`:static` scheduler creates one task per thread and divides the iterations equally among +them, assigning each task specifically to each thread. In particular, the value of +[`threadid()`](@ref Threads.threadid) is guranteed to be constant within one iteration. +Specifying `:static` is an error if used from inside another `@threads` loop or from a +thread other than 1. + +!!! note + `:static` scheduling exists for supporting transition of code written before Julia 1.3. + In newly written library functions, `:static` scheduling is discouraged because the + functions using this option cannot be called from arbitrary worker threads. + +## Example + +To illustrate of the different scheduling strategies, consider the following function +`busywait` containing a non-yielding timed loop that runs for a given number of seconds. ```julia-repl julia> function busywait(seconds) @@ -166,10 +202,6 @@ julia> @time begin The `:dynamic` example takes 2 seconds since one of the non-occupied threads is able to run two of the 1-second iterations to complete the for loop. - -See also: [`@spawn`](@ref Threads.@spawn), [`nthreads()`](@ref Threads.nthreads), -[`threadid()`](@ref Threads.threadid), `pmap` in [`Distributed`](@ref man-distributed), -`BLAS.set_num_threads` in [`LinearAlgebra`](@ref man-linalg). """ macro threads(args...) na = length(args)