From b9b362a652aba394b029e5cc6ff217f4f2bf103e Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Mon, 5 Jun 2023 12:22:27 -0600 Subject: [PATCH 01/25] Create PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 186 ++++++++++++++++++++++++++ 1 file changed, 186 insertions(+) create mode 100644 blog/2023/06/PSA-dont-use-threadid.md diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md new file mode 100644 index 0000000000..49b06be864 --- /dev/null +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -0,0 +1,186 @@ +## TODO's before publication +- [ ] Update master & 1.9 docs to guide against misuse of `threadid()` e.g. + - [ ] `add docs on task-specific buffering using @threads` [#48542](https://github.com/JuliaLang/julia/pull/48542) + - [ ] `add docs on task migration` [#50047](https://github.com/JuliaLang/julia/pull/50047) + - [ ] Add `!!1compat` notes to threadid/nthreads/maxthreadid + - [ ] Add note to recommend against `@async` +- [ ] Make sure that ^ 1.9 version is released so docs are discoverable +- [ ] Get signoff by core multithreading developers +- [ ] More detail in the Packages section, perhaps with examples (or remove it) +- [ ] Bikeshead the title. + +# PSA: Stop using `states[threadid()]` +Alt titles: +- PSA: Multithreading with `states[threadid()]` is unsafe +- PSA: Multithreading with `buffers[threadid()]` is unsafe +- PSA: Don't assume `threadid()` is stable within a task + +__ +#### (or really any uses of `threadid()`) + + +Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which will give subtly wrong results. It's important for the stability and correctness of the ecosystem that these usages are identified and fixed. + +The general template that this incorrect code follows is something like the following: + +```julia +using Base.Threads: nthreads, @threads, threadid + +states = [some_initial_value for _ in 1:nthreads()] +@threads for x in some_data + tid = threadid() + old_val = states[tid] + new_val = some_operator(old_val, f(x)) + states[tid] = new_val +end +do_something(states) +``` + +The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. + +This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. + +This is not actually a problem with multithreading specifically, but really a concurrency problem, and it can be demonstrated even with a single thread. For example: +``` +$ julia --threads=1 +``` +```julia +julia> f(i) = (sleep(0.001); i); + +julia> let state = [0], N=100 + @sync for i ∈ 1:N + Threads.@spawn begin + tid = Threads.threadid() + old_var = state[tid] + new_var = old_var + f(i) + state[tid] = new_var + end + end + sum(state), sum(1:N) + end +(100, 5050) +``` +In the above snippet, we purposefully over-subscribed the CPU with `100` separate tasks in order to make the bug more likely to manifest, but the problem **can** arise even without spawning very many tasks. + +Any usage of `threadid()` in package or user code should be seen as a warning sign that the code is relying on implementation details, and is open to concurrency bugs. + +# Fixing buggy code which uses this pattern + +## Quickfix: Replace `@threads` with `@threads :static` + +The simplest (though not recommended longterm) quickfix if you happen to use `@threads` is to replace usages of `@threads for ...` with `@threads :static for ...`. The reason for this is that the `:static` scheduler for `@threads` does not allow the asynchronous task migration and yielding that causes the bug in the first place. + +However, this is not a good long term solution because +1) It's relying on guard rails to prevent otherwise incorrect code to be correct +2) `@threads :static` is not cooperative or composable, that means that if code inside your `@threads :static` loop also does multithreading, your code could be reduced to worse than single-threaded speeds, or even deadlock. + +## Better fix: Work directly with tasks + +If you want a recipe that can replace the above buggy one with something that can be written using only the `Base.Threads` module, we recommend moving away from `@threads`, and instead working directly with `@spawn` to create and manage tasks. The reason is that `@threads` does not have any builtin mechanisms for managing and merging the results of work from different threads, whereas tasks can manage and return their own state in a safe way. + +Tasks creating and returning their own state is inherently safer than the spawner of paralell tasks setting up state for spawned tasks to read from and write to. + +Code which replaces the incorrect code pattern shown above can look like this: +```julia +using Base.Threads: nthreads, @threads, @spawn +using Base.Iterators: partition + +tasks_per_thread = 2 # customize this as needed. More tasks have more overhead, but better load balancing + +chunk_size = max(1, length(some_data) ÷ (tasks_per_thread * nthreads())) +data_chunks = partition(some_data, chunk_size) # partition your data into chunks that individual tasks will deal with +# See also ChunkSplitters.jl and SplittablesBase.jl for partitioning data + +tasks = map(data_chunks) do chunk + # Each chunk of your data gets its own spawned task that does its own local work and returns a result + @spawn begin + state = some_initial_value + for x in chunk + state = some_operator(state, f(x)) + end + return state + end +end +states = fetch.(tasks) # get all the values returned by the individual tasks. +# fetch is type unstable, so you may optionally want to assert a specific return value. + +do_something(states) +``` + +This is a fully general replacement for the old, buggy pattern. However, for many applications this should be simplified down to a parallel version of `mapreduce`: +```julia +using Threads: nthreads, @spawn +function pmapreduce(f, op, itr; init=some_initial_value, chunks_per_thread::Int = 2) + chunk_size = max(1, length(itr) ÷ chunks_per_thread) + tasks = map(Iterators.partition(itr, chunk_size)) do chunk + @spawn mapreduce(f, op, chunk; init=init) + end + mapreduce(fetch, op, tasks; init=init) +end +``` +In `pmapreduce(f, op, itr)`, the function `f` is applied to each element of `itr`, and then an *associative*[^assoc] two-argument function `op`. + +The above `pmampreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above. + +## Another option: Use a package which handles this correctly + +We encourage people to check out various multithreading libraries like [Transducers.jl](https://github.com/JuliaFolds2/Transducers.jl) (or various frontends like [ThreadsX.jl](https://github.com/tkf/ThreadsX.jl), [FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), and [Folds.jl](https://github.com/JuliaFolds/Folds.jl)), and [MultiThreadedCaches.jl](https://github.com/JuliaConcurrent/MultiThreadedCaches.jl). + +### Transducers.jl ecosystem +Transducers.jl is fundamentally about expressing the traversing of data in a structured and principled way, often with the benefit that it makes parallel computing easier to reason about. + +The above `pmapreduce(f, op, itr)` could be expressed as +```julia +using Transducers +itr |> Map(f) |> foldxt(op; init=some_initial_value) +``` +or +```julia +using Transducers +foldxt(op, Map(f), itr; init=some_initial_value) +``` +The various frontends listed provide different APIs for Transducers.jl which some people may find easier to use. E.g. +```julia +using ThreadsX +ThreasdX.mapreduce(f, op, itr; init=some_initial_value) +``` +or +```julia +using FLoops +@floop for x in itr + @reduce out = op(some_initial_value, f(x)) +end +out +``` + +### MultiThreadedCaches.jl + +MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid()]`-like pattern safer by using lock mechanisms to stop data races. We think this is not an ideal way to proceed, but it may make transitioning to safer code easier and require fewer rewrites, such as in cases where a package must manage state which may be concurrently written to by a user, and the package cannot control how the user structures their code. + +[^historynote]: ### Concurrency & Parallelism + + In Julia, tasks are the primitive mechanism to express concurrency. Concurrency is the ability to execute more than one program or task simultaneously. + + Tasks will be mapped onto any number of "worker-threads" that will lead them to be executed in parallel. This is often called `M:N` threading or green threads. Even if Julia is started with only one worker-thread, the programmer can express concurrent operations. + + In early versions of Julia, the `@async` macro was used to schedule concurrent tasks which were executed asynchronously on a single thread. Later, the `@threads` macro was developed for CPU-parallelism and allowed users to easily execute chunks of a loop in parallel, but at the time this was disjoint from the notions of tasks in the language. This lead to various composability problems and motivated language changes. + + The concurrency model in Julia has been evolving over minor versions. Julia v1.3 introduced the parallel scheduler for tasks and `Threads.@spawn`; v1.5 introduced `@threads :static` with the intention to change the default scheduling in future releases. Julia v1.7 enabled task migration, and Julia v1.8 changed the default for `@threads` to the dynamic scheduler. + + When the parallel scheduler was introduced, we decided that there was too much code in the wild using `@async` and expecting specific semantics, so `Threads.@spawn` was made available to access the new semantics. `@async` has various problems and we discourage its use for new code. + + #### Uses of `threadid`/`nthreads`/`maxthreadid` + + Over time, several features have been added that make relying on `threadid`, `nthreads` and even `maxthreadid` perilous: + + 1. Task migration: A task can observe multiple `threadid`s during its execution. + 2. Interactive priority: `nthreads()` will report the number of non-interactive worker-threads, thus undercounting the number of active threads. + 3. Thread adoption (v1.9): Foreign threads can now be adopted (and latter removed) at any time during the execution of the program. + 4. GC threads: The runtime can use additional threads to accelerate work like executing the Garbage Collector. + + Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*. + +[^assoc]: ## Associativity + [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. + + Note that associativity is a not the same as commutivity which is the property that `op(a, b) = op(b, a)`. This is *not* required for parallel reducing functions. From e0de374832b6b8538e4b0b7a4fe50bc313290f37 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Mon, 5 Jun 2023 12:28:10 -0600 Subject: [PATCH 02/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 1 - 1 file changed, 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 49b06be864..6b23bad187 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -16,7 +16,6 @@ Alt titles: - PSA: Don't assume `threadid()` is stable within a task __ -#### (or really any uses of `threadid()`) Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which will give subtly wrong results. It's important for the stability and correctness of the ecosystem that these usages are identified and fixed. From 9f220116b9f7ec7a5d5753d9f73b413a8111235d Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Mon, 5 Jun 2023 12:29:59 -0600 Subject: [PATCH 03/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 6b23bad187..1e1cbdbe56 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -1,14 +1,3 @@ -## TODO's before publication -- [ ] Update master & 1.9 docs to guide against misuse of `threadid()` e.g. - - [ ] `add docs on task-specific buffering using @threads` [#48542](https://github.com/JuliaLang/julia/pull/48542) - - [ ] `add docs on task migration` [#50047](https://github.com/JuliaLang/julia/pull/50047) - - [ ] Add `!!1compat` notes to threadid/nthreads/maxthreadid - - [ ] Add note to recommend against `@async` -- [ ] Make sure that ^ 1.9 version is released so docs are discoverable -- [ ] Get signoff by core multithreading developers -- [ ] More detail in the Packages section, perhaps with examples (or remove it) -- [ ] Bikeshead the title. - # PSA: Stop using `states[threadid()]` Alt titles: - PSA: Multithreading with `states[threadid()]` is unsafe From 611ad1a5cae557aca5ae9ab76a2c3a60038d7f85 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Mon, 5 Jun 2023 13:33:27 -0600 Subject: [PATCH 04/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Valentin Churavy --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 1e1cbdbe56..ad8422ca62 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -112,7 +112,7 @@ The above `pmampreduce` can hopefully be added to base Julia at some point in th ## Another option: Use a package which handles this correctly -We encourage people to check out various multithreading libraries like [Transducers.jl](https://github.com/JuliaFolds2/Transducers.jl) (or various frontends like [ThreadsX.jl](https://github.com/tkf/ThreadsX.jl), [FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), and [Folds.jl](https://github.com/JuliaFolds/Folds.jl)), and [MultiThreadedCaches.jl](https://github.com/JuliaConcurrent/MultiThreadedCaches.jl). +We encourage people to check out various multithreading libraries like [Transducers.jl](https://github.com/JuliaFolds2/Transducers.jl) (or various frontends like [ThreadsX.jl](https://github.com/tkf/ThreadsX.jl), [FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), and [Folds.jl](https://github.com/JuliaFolds/Folds.jl)), and [MultiThreadedCaches.jl](https://github.com/JuliaConcurrent/MultiThreadedCaches.jl). ### Transducers.jl ecosystem Transducers.jl is fundamentally about expressing the traversing of data in a structured and principled way, often with the benefit that it makes parallel computing easier to reason about. From 5e8d2912c180d5f7f70b987b657dac45fb7a9093 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Mon, 5 Jun 2023 13:35:01 -0600 Subject: [PATCH 05/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Valentin Churavy --- blog/2023/06/PSA-dont-use-threadid.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index ad8422ca62..b8612e7c1b 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -1,3 +1,11 @@ +@def rss_pubdate = Date(2023, 6, XX) +@def rss = """ TBD """ +@def published = "xx June 2023" +@def title = "PSA: Stop using `states[threadid()]`" +@def authors = """Mason Protter, Valentin Churavy, ...""" +@def mintoclevel=2 +@def maxtoclevel=3 + # PSA: Stop using `states[threadid()]` Alt titles: - PSA: Multithreading with `states[threadid()]` is unsafe From 5604d309ae5708ce162c65b75e8f58c29ee201a2 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Mon, 5 Jun 2023 16:37:36 -0600 Subject: [PATCH 06/25] try fixing toplevel decls --- blog/2023/06/PSA-dont-use-threadid.md | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index b8612e7c1b..15fbf8cd8f 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -1,10 +1,4 @@ -@def rss_pubdate = Date(2023, 6, XX) -@def rss = """ TBD """ -@def published = "xx June 2023" -@def title = "PSA: Stop using `states[threadid()]`" -@def authors = """Mason Protter, Valentin Churavy, ...""" -@def mintoclevel=2 -@def maxtoclevel=3 ++++ mintoclevel = 2 maxtoclevel = 3 title = "PSA: Stop using `states[threadid()]`" authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." published = "XX June 2023" rss_pubdate = Date(2023, 06, XX) rss = """PSA: Stop using `states[threadid()]`""" +++ # PSA: Stop using `states[threadid()]` Alt titles: From 1d514549a4a36a4b6cae53ca21a280111b3d8b25 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Tue, 6 Jun 2023 11:43:40 -0600 Subject: [PATCH 07/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com> --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 15fbf8cd8f..24a9abe0d7 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -68,7 +68,7 @@ However, this is not a good long term solution because If you want a recipe that can replace the above buggy one with something that can be written using only the `Base.Threads` module, we recommend moving away from `@threads`, and instead working directly with `@spawn` to create and manage tasks. The reason is that `@threads` does not have any builtin mechanisms for managing and merging the results of work from different threads, whereas tasks can manage and return their own state in a safe way. -Tasks creating and returning their own state is inherently safer than the spawner of paralell tasks setting up state for spawned tasks to read from and write to. +Tasks creating and returning their own state is inherently safer than the spawner of parallel tasks setting up state for spawned tasks to read from and write to. Code which replaces the incorrect code pattern shown above can look like this: ```julia From 2b5353eb2e7118958f0ced963480ca5b2af32743 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Tue, 6 Jun 2023 11:44:56 -0600 Subject: [PATCH 08/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com> --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 24a9abe0d7..a3d1ef84f8 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -165,7 +165,7 @@ MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid() 1. Task migration: A task can observe multiple `threadid`s during its execution. 2. Interactive priority: `nthreads()` will report the number of non-interactive worker-threads, thus undercounting the number of active threads. - 3. Thread adoption (v1.9): Foreign threads can now be adopted (and latter removed) at any time during the execution of the program. + 3. Thread adoption (v1.9): Foreign threads can now be adopted (and later be removed) at any time during the execution of the program. 4. GC threads: The runtime can use additional threads to accelerate work like executing the Garbage Collector. Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*. From fa7f41025d51f58734ea028ff39207e9119c26db Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Tue, 6 Jun 2023 11:45:22 -0600 Subject: [PATCH 09/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com> --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index a3d1ef84f8..2df197b4e6 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -168,7 +168,7 @@ MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid() 3. Thread adoption (v1.9): Foreign threads can now be adopted (and later be removed) at any time during the execution of the program. 4. GC threads: The runtime can use additional threads to accelerate work like executing the Garbage Collector. - Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*. + Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*, i.e. pieces of work that may execute concurrently with other code, independently of the number of *threads* that are used for executing them. [^assoc]: ## Associativity [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. From 50d8747260b8296d8248e80fc3cfd3a42262afbe Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Tue, 6 Jun 2023 11:45:50 -0600 Subject: [PATCH 10/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com> --- blog/2023/06/PSA-dont-use-threadid.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 2df197b4e6..68aa8f0b15 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -171,6 +171,6 @@ MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid() Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*, i.e. pieces of work that may execute concurrently with other code, independently of the number of *threads* that are used for executing them. [^assoc]: ## Associativity - [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. + [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions, because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. - Note that associativity is a not the same as commutivity which is the property that `op(a, b) = op(b, a)`. This is *not* required for parallel reducing functions. + Note that associativity is not the same as commutivity, which is the property that `op(a, b) == op(b, a)`. This is *not* required for parallel reducing functions. From 0792a047f6cb2f2bcab89a91feab52891ee1c746 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Tue, 6 Jun 2023 14:17:56 -0600 Subject: [PATCH 11/25] pmapreduce -> tmapreduce --- blog/2023/06/PSA-dont-use-threadid.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 68aa8f0b15..f7e5becd45 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -100,7 +100,7 @@ do_something(states) This is a fully general replacement for the old, buggy pattern. However, for many applications this should be simplified down to a parallel version of `mapreduce`: ```julia using Threads: nthreads, @spawn -function pmapreduce(f, op, itr; init=some_initial_value, chunks_per_thread::Int = 2) +function tmapreduce(f, op, itr; init=some_initial_value, chunks_per_thread::Int = 2) chunk_size = max(1, length(itr) ÷ chunks_per_thread) tasks = map(Iterators.partition(itr, chunk_size)) do chunk @spawn mapreduce(f, op, chunk; init=init) @@ -108,9 +108,9 @@ function pmapreduce(f, op, itr; init=some_initial_value, chunks_per_thread::Int mapreduce(fetch, op, tasks; init=init) end ``` -In `pmapreduce(f, op, itr)`, the function `f` is applied to each element of `itr`, and then an *associative*[^assoc] two-argument function `op`. +In `tmapreduce(f, op, itr)`, the function `f` is applied to each element of `itr`, and then an *associative*[^assoc] two-argument function `op`. -The above `pmampreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above. +The above `tmampreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above. ## Another option: Use a package which handles this correctly @@ -119,7 +119,7 @@ We encourage people to check out various multithreading libraries like [Transduc ### Transducers.jl ecosystem Transducers.jl is fundamentally about expressing the traversing of data in a structured and principled way, often with the benefit that it makes parallel computing easier to reason about. -The above `pmapreduce(f, op, itr)` could be expressed as +The above `tmapreduce(f, op, itr)` could be expressed as ```julia using Transducers itr |> Map(f) |> foldxt(op; init=some_initial_value) From d5fe14b5d954c3ef520e1afda825461179bfc99d Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Thu, 8 Jun 2023 10:02:53 -0600 Subject: [PATCH 12/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Klaus Crusius --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index f7e5becd45..fd5c79b41c 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -92,7 +92,7 @@ tasks = map(data_chunks) do chunk end end states = fetch.(tasks) # get all the values returned by the individual tasks. -# fetch is type unstable, so you may optionally want to assert a specific return value. +# fetch is type unstable, so you may optionally want to assert a specific return type. do_something(states) ``` From c69f3345848fc948db0fdad5e2b66e889d80f7f8 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Fri, 9 Jun 2023 18:47:32 -0600 Subject: [PATCH 13/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index fd5c79b41c..f97d6a965a 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -9,7 +9,7 @@ Alt titles: __ -Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which will give subtly wrong results. It's important for the stability and correctness of the ecosystem that these usages are identified and fixed. +Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which can give wrong results. This pattern has even been erroneously recommended in [previous official julia blogposts](https://github.com/JuliaLang/www.julialang.org/blob/main/blog/2019/07/multithreading.md#thread-local-state). It is important for the stability and correctness of the ecosystem that these usages are identified and fixed. The general template that this incorrect code follows is something like the following: @@ -26,9 +26,7 @@ end do_something(states) ``` -The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. - -This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. +The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. This is not actually a problem with multithreading specifically, but really a concurrency problem, and it can be demonstrated even with a single thread. For example: ``` From 65e25e87d01111cc45a477c8072c9f5dc5878a39 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Sun, 11 Jun 2023 14:05:53 -0600 Subject: [PATCH 14/25] change proposed title --- blog/2023/06/PSA-dont-use-threadid.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index f97d6a965a..fd143ff118 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -1,10 +1,11 @@ +++ mintoclevel = 2 maxtoclevel = 3 title = "PSA: Stop using `states[threadid()]`" authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." published = "XX June 2023" rss_pubdate = Date(2023, 06, XX) rss = """PSA: Stop using `states[threadid()]`""" +++ -# PSA: Stop using `states[threadid()]` +# PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads() Alt titles: - PSA: Multithreading with `states[threadid()]` is unsafe - PSA: Multithreading with `buffers[threadid()]` is unsafe - PSA: Don't assume `threadid()` is stable within a task +- PSA: Stop using `states[threadid()]` __ From 42b1fa527a761b7157ae0a0afb543e6c3da214c6 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Tue, 27 Jun 2023 10:35:35 -0600 Subject: [PATCH 15/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index fd143ff118..aa6389b267 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -1,13 +1,6 @@ -+++ mintoclevel = 2 maxtoclevel = 3 title = "PSA: Stop using `states[threadid()]`" authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." published = "XX June 2023" rss_pubdate = Date(2023, 06, XX) rss = """PSA: Stop using `states[threadid()]`""" +++ ++++ mintoclevel = 2 maxtoclevel = 3 title = "PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()" authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." published = "XX June 2023" rss_pubdate = Date(2023, 06, XX) rss = """PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()""" +++ # PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads() -Alt titles: -- PSA: Multithreading with `states[threadid()]` is unsafe -- PSA: Multithreading with `buffers[threadid()]` is unsafe -- PSA: Don't assume `threadid()` is stable within a task -- PSA: Stop using `states[threadid()]` - -__ Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which can give wrong results. This pattern has even been erroneously recommended in [previous official julia blogposts](https://github.com/JuliaLang/www.julialang.org/blob/main/blog/2019/07/multithreading.md#thread-local-state). It is important for the stability and correctness of the ecosystem that these usages are identified and fixed. From cb233e546851c1fb1e2034b1ca324ee7180f14fa Mon Sep 17 00:00:00 2001 From: Ian Butterworth <1694067+IanButterworth@users.noreply.github.com> Date: Thu, 29 Jun 2023 15:28:11 -0400 Subject: [PATCH 16/25] fix formatting Co-Authored-By: Mason Protter Co-Authored-By: Thibaut Lienart --- blog/2023/06/PSA-dont-use-threadid.md | 120 ++++++++++++++++++-------- 1 file changed, 86 insertions(+), 34 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index aa6389b267..a5cbe0d0fa 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -1,7 +1,21 @@ -+++ mintoclevel = 2 maxtoclevel = 3 title = "PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()" authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." published = "XX June 2023" rss_pubdate = Date(2023, 06, XX) rss = """PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()""" +++ ++++ +mintoclevel = 2 +maxtoclevel = 3 +title = "PSA: Thread-local state is no longer recommended" +authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." +published = "30 June 2023" +rss_pubdate = Date(2023, 06, 30) +rss = """PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()""" ++++ -# PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads() +*Common misconceptions about threadid() and nthreads()* + +\toc + + + +## The buggy pattern Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which can give wrong results. This pattern has even been erroneously recommended in [previous official julia blogposts](https://github.com/JuliaLang/www.julialang.org/blob/main/blog/2019/07/multithreading.md#thread-local-state). It is important for the stability and correctness of the ecosystem that these usages are identified and fixed. @@ -22,15 +36,17 @@ do_something(states) The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. -This is not actually a problem with multithreading specifically, but really a concurrency problem, and it can be demonstrated even with a single thread. For example: +This is not actually a problem with multithreading specifically, but really a concurrency problem, and it can be demonstrated even with a single thread. For example: + ``` $ julia --threads=1 ``` + ```julia julia> f(i) = (sleep(0.001); i); julia> let state = [0], N=100 - @sync for i ∈ 1:N + @sync for i ∈ 1:N Threads.@spawn begin tid = Threads.threadid() old_var = state[tid] @@ -42,13 +58,14 @@ julia> let state = [0], N=100 end (100, 5050) ``` + In the above snippet, we purposefully over-subscribed the CPU with `100` separate tasks in order to make the bug more likely to manifest, but the problem **can** arise even without spawning very many tasks. Any usage of `threadid()` in package or user code should be seen as a warning sign that the code is relying on implementation details, and is open to concurrency bugs. -# Fixing buggy code which uses this pattern +## Fixing buggy code which uses this pattern -## Quickfix: Replace `@threads` with `@threads :static` +### Quickfix: Replace `@threads` with `@threads :static` The simplest (though not recommended longterm) quickfix if you happen to use `@threads` is to replace usages of `@threads for ...` with `@threads :static for ...`. The reason for this is that the `:static` scheduler for `@threads` does not allow the asynchronous task migration and yielding that causes the bug in the first place. @@ -56,13 +73,14 @@ However, this is not a good long term solution because 1) It's relying on guard rails to prevent otherwise incorrect code to be correct 2) `@threads :static` is not cooperative or composable, that means that if code inside your `@threads :static` loop also does multithreading, your code could be reduced to worse than single-threaded speeds, or even deadlock. -## Better fix: Work directly with tasks +### Better fix: Work directly with tasks If you want a recipe that can replace the above buggy one with something that can be written using only the `Base.Threads` module, we recommend moving away from `@threads`, and instead working directly with `@spawn` to create and manage tasks. The reason is that `@threads` does not have any builtin mechanisms for managing and merging the results of work from different threads, whereas tasks can manage and return their own state in a safe way. -Tasks creating and returning their own state is inherently safer than the spawner of parallel tasks setting up state for spawned tasks to read from and write to. +Tasks creating and returning their own state is inherently safer than the spawner of parallel tasks setting up state for spawned tasks to read from and write to. Code which replaces the incorrect code pattern shown above can look like this: + ```julia using Base.Threads: nthreads, @threads, @spawn using Base.Iterators: partition @@ -90,43 +108,52 @@ do_something(states) ``` This is a fully general replacement for the old, buggy pattern. However, for many applications this should be simplified down to a parallel version of `mapreduce`: + ```julia using Threads: nthreads, @spawn -function tmapreduce(f, op, itr; init=some_initial_value, chunks_per_thread::Int = 2) +function tmapreduce(f, op, itr; chunks_per_thread::Int = 2, kwargs...) chunk_size = max(1, length(itr) ÷ chunks_per_thread) tasks = map(Iterators.partition(itr, chunk_size)) do chunk - @spawn mapreduce(f, op, chunk; init=init) + @spawn mapreduce(f, op, chunk; kwargs...) end - mapreduce(fetch, op, tasks; init=init) + mapreduce(fetch, op, tasks; kwargs...) end ``` + In `tmapreduce(f, op, itr)`, the function `f` is applied to each element of `itr`, and then an *associative*[^assoc] two-argument function `op`. The above `tmampreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above. -## Another option: Use a package which handles this correctly +### Another option: Use a package which handles this correctly We encourage people to check out various multithreading libraries like [Transducers.jl](https://github.com/JuliaFolds2/Transducers.jl) (or various frontends like [ThreadsX.jl](https://github.com/tkf/ThreadsX.jl), [FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), and [Folds.jl](https://github.com/JuliaFolds/Folds.jl)), and [MultiThreadedCaches.jl](https://github.com/JuliaConcurrent/MultiThreadedCaches.jl). -### Transducers.jl ecosystem -Transducers.jl is fundamentally about expressing the traversing of data in a structured and principled way, often with the benefit that it makes parallel computing easier to reason about. +#### Transducers.jl ecosystem +Transducers.jl is fundamentally about expressing the traversing of data in a structured and principled way, often with the benefit that it makes parallel computing easier to reason about. The above `tmapreduce(f, op, itr)` could be expressed as + ```julia using Transducers itr |> Map(f) |> foldxt(op; init=some_initial_value) ``` + or + ```julia using Transducers foldxt(op, Map(f), itr; init=some_initial_value) ``` -The various frontends listed provide different APIs for Transducers.jl which some people may find easier to use. E.g. + +The various frontends listed provide different APIs for Transducers.jl which some people may find easier to use. E.g. + ```julia using ThreadsX ThreasdX.mapreduce(f, op, itr; init=some_initial_value) ``` + or + ```julia using FLoops @floop for x in itr @@ -135,34 +162,59 @@ end out ``` -### MultiThreadedCaches.jl +#### MultiThreadedCaches.jl MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid()]`-like pattern safer by using lock mechanisms to stop data races. We think this is not an ideal way to proceed, but it may make transitioning to safer code easier and require fewer rewrites, such as in cases where a package must manage state which may be concurrently written to by a user, and the package cannot control how the user structures their code. -[^historynote]: ### Concurrency & Parallelism +~~~ + +~~~ + +[^historynote]: ~~~

~~~Concurrency & Parallelism~~~

~~~ +@@long-footnote + In Julia, tasks are the primitive mechanism to express concurrency. Concurrency is the ability to execute more than one program or task simultaneously. + + Tasks will be mapped onto any number of "worker-threads" that will lead them to be executed in parallel. This is often called `M:N` threading or green threads. Even if Julia is started with only one worker-thread, the programmer can express concurrent operations. + + In early versions of Julia, the `@async` macro was used to schedule concurrent tasks which were executed asynchronously on a single thread. Later, the `@threads` macro was developed for CPU-parallelism and allowed users to easily execute chunks of a loop in parallel, but at the time this was disjoint from the notions of tasks in the language. This lead to various composability problems and motivated language changes. - In Julia, tasks are the primitive mechanism to express concurrency. Concurrency is the ability to execute more than one program or task simultaneously. + The concurrency model in Julia has been evolving over minor versions. Julia v1.3 introduced the parallel scheduler for tasks and `Threads.@spawn`; v1.5 introduced `@threads :static` with the intention to change the default scheduling in future releases. Julia v1.7 enabled task migration, and Julia v1.8 changed the default for `@threads` to the dynamic scheduler. - Tasks will be mapped onto any number of "worker-threads" that will lead them to be executed in parallel. This is often called `M:N` threading or green threads. Even if Julia is started with only one worker-thread, the programmer can express concurrent operations. + When the parallel scheduler was introduced, we decided that there was too much code in the wild using `@async` and expecting specific semantics, so `Threads.@spawn` was made available to access the new semantics. `@async` has various problems and we discourage its use for new code. - In early versions of Julia, the `@async` macro was used to schedule concurrent tasks which were executed asynchronously on a single thread. Later, the `@threads` macro was developed for CPU-parallelism and allowed users to easily execute chunks of a loop in parallel, but at the time this was disjoint from the notions of tasks in the language. This lead to various composability problems and motivated language changes. + ~~~

~~~ + Uses of `threadid`/`nthreads`/`maxthreadid` + ~~~

~~~ - The concurrency model in Julia has been evolving over minor versions. Julia v1.3 introduced the parallel scheduler for tasks and `Threads.@spawn`; v1.5 introduced `@threads :static` with the intention to change the default scheduling in future releases. Julia v1.7 enabled task migration, and Julia v1.8 changed the default for `@threads` to the dynamic scheduler. + Over time, several features have been added that make relying on `threadid`, `nthreads` and even `maxthreadid` perilous: - When the parallel scheduler was introduced, we decided that there was too much code in the wild using `@async` and expecting specific semantics, so `Threads.@spawn` was made available to access the new semantics. `@async` has various problems and we discourage its use for new code. + 1. Task migration: A task can observe multiple `threadid`s during its execution. + 2. Interactive priority: `nthreads()` will report the number of non-interactive worker-threads, thus undercounting the number of active threads. + 3. Thread adoption (v1.9): Foreign threads can now be adopted (and later be removed) at any time during the execution of the program. + 4. GC threads: The runtime can use additional threads to accelerate work like executing the Garbage Collector. - #### Uses of `threadid`/`nthreads`/`maxthreadid` + Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*, i.e. pieces of work that may execute concurrently with other code, independently of the number of *threads* that are used for executing them. +@@ - Over time, several features have been added that make relying on `threadid`, `nthreads` and even `maxthreadid` perilous: +\\ - 1. Task migration: A task can observe multiple `threadid`s during its execution. - 2. Interactive priority: `nthreads()` will report the number of non-interactive worker-threads, thus undercounting the number of active threads. - 3. Thread adoption (v1.9): Foreign threads can now be adopted (and later be removed) at any time during the execution of the program. - 4. GC threads: The runtime can use additional threads to accelerate work like executing the Garbage Collector. +[^assoc]: ~~~

~~~Associativity~~~

~~~ +@@long-footnote + [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions, because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. - Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*, i.e. pieces of work that may execute concurrently with other code, independently of the number of *threads* that are used for executing them. - -[^assoc]: ## Associativity - [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions, because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. - - Note that associativity is not the same as commutivity, which is the property that `op(a, b) == op(b, a)`. This is *not* required for parallel reducing functions. + Note that associativity is not the same as commutivity, which is the property that `op(a, b) == op(b, a)`. This is *not* required for parallel reducing functions. +@@ From 26b4020c69928d431e45f69c3e485ac95277f428 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Fri, 30 Jun 2023 09:38:42 -0600 Subject: [PATCH 17/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Tim Holy --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index a5cbe0d0fa..8711df1eb2 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -122,7 +122,7 @@ end In `tmapreduce(f, op, itr)`, the function `f` is applied to each element of `itr`, and then an *associative*[^assoc] two-argument function `op`. -The above `tmampreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above. +The above `tmapreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above. ### Another option: Use a package which handles this correctly From d3b24f259444d8d0757f27d1701d3a419562f3e4 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Fri, 30 Jun 2023 11:40:37 -0600 Subject: [PATCH 18/25] add comments from @timholy showing the steps to the race condition --- blog/2023/06/PSA-dont-use-threadid.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 8711df1eb2..612f4afbaf 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -26,7 +26,7 @@ using Base.Threads: nthreads, @threads, threadid states = [some_initial_value for _ in 1:nthreads()] @threads for x in some_data - tid = threadid() + tid = threadid() old_val = states[tid] new_val = some_operator(old_val, f(x)) states[tid] = new_val @@ -48,10 +48,10 @@ julia> f(i) = (sleep(0.001); i); julia> let state = [0], N=100 @sync for i ∈ 1:N Threads.@spawn begin - tid = Threads.threadid() - old_var = state[tid] - new_var = old_var + f(i) - state[tid] = new_var + tid = Threads.threadid() # each task gets `tid = 1` + old_var = state[tid] # each task reads the current value, which for all is 0 (!) because... + new_var = old_var + f(i) # ...the `sleep` in `f` causes all tasks to pause *simultaneously* here (all loop iterations start, but do not yet finish) + state[tid] = new_var # after being released from the `sleep`, each task sets `state[1]` to `i` end end sum(state), sum(1:N) From 9fef79ced2647ebcdb5e320264f5f2631cdf558e Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Fri, 30 Jun 2023 11:54:14 -0600 Subject: [PATCH 19/25] add footnote about trying to predict yielding --- blog/2023/06/PSA-dont-use-threadid.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 612f4afbaf..7a4568131f 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -34,7 +34,7 @@ end do_something(states) ``` -The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. +The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution[^yielding]. This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. This is not actually a problem with multithreading specifically, but really a concurrency problem, and it can be demonstrated even with a single thread. For example: @@ -212,6 +212,17 @@ MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid() \\ +[^yielding]: ~~~

~~~Don't try to reason about yielding~~~

~~~ +@@long-footnote + Many existing uses of thread local state happen to be relatively robust and give correct answers only because the functions they are calling during execution do not yield. One may then think "well, I can just avoid this problem by making sure my code doesn't yeield", but we think this is a bad and unsustainable idea, because whether or not a function call will yield is not stable, obvious, or easily inspectable. + + For instance, if a function `f` is updated to include a background `@debug` statement or other forms of non-user-visible IO, it may change from being non-yielding to yielding. If during a call to `f`, the compiler encounters a dynamic dispatch where new code must be JIT compiled, a yield-point may be encountered, and any number of other internal changes could happen to code which can cause it to yielding. + + Furthermore, future versions of julia may eventually move away from a [cooperative task model](https://en.wikipedia.org/wiki/Cooperative_multitasking) to a [preemptive task model](https://en.wikipedia.org/wiki/Preemption_(computing), in which case yield points would not be the only way that race conditions like this could be encountered. +@@ + +\\ + [^assoc]: ~~~

~~~Associativity~~~

~~~ @@long-footnote [Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions, because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed. From 7f0bd4c2a733065922daa87e4789f7ae832379ce Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Fri, 30 Jun 2023 11:58:56 -0600 Subject: [PATCH 20/25] typo --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 7a4568131f..dc18309d64 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -214,7 +214,7 @@ MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid() [^yielding]: ~~~

~~~Don't try to reason about yielding~~~

~~~ @@long-footnote - Many existing uses of thread local state happen to be relatively robust and give correct answers only because the functions they are calling during execution do not yield. One may then think "well, I can just avoid this problem by making sure my code doesn't yeield", but we think this is a bad and unsustainable idea, because whether or not a function call will yield is not stable, obvious, or easily inspectable. + Many existing uses of thread local state happen to be relatively robust and give correct answers only because the functions they are calling during execution do not yield. One may then think "well, I can just avoid this problem by making sure my code doesn't yield", but we think this is a bad and unsustainable idea, because whether or not a function call will yield is not stable, obvious, or easily inspectable. For instance, if a function `f` is updated to include a background `@debug` statement or other forms of non-user-visible IO, it may change from being non-yielding to yielding. If during a call to `f`, the compiler encounters a dynamic dispatch where new code must be JIT compiled, a yield-point may be encountered, and any number of other internal changes could happen to code which can cause it to yielding. From 94ff21949508de16773bfa6ea9de144182867c24 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Wed, 5 Jul 2023 12:44:37 -0600 Subject: [PATCH 21/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index dc18309d64..929d397176 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -92,7 +92,7 @@ data_chunks = partition(some_data, chunk_size) # partition your data into chunks # See also ChunkSplitters.jl and SplittablesBase.jl for partitioning data tasks = map(data_chunks) do chunk - # Each chunk of your data gets its own spawned task that does its own local work and returns a result + # Each chunk of your data gets its own spawned task that does its own local, sequential work @spawn begin state = some_initial_value for x in chunk From 8a0e33529342b14e1cc7bd43bd81c52b56c25483 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Wed, 5 Jul 2023 12:45:36 -0600 Subject: [PATCH 22/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 1 + 1 file changed, 1 insertion(+) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 929d397176..0c9999102e 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -93,6 +93,7 @@ data_chunks = partition(some_data, chunk_size) # partition your data into chunks tasks = map(data_chunks) do chunk # Each chunk of your data gets its own spawned task that does its own local, sequential work + # and then returns the result @spawn begin state = some_initial_value for x in chunk From 93a97ab64bc8b7f3eaa275a2262ee46548e9a97d Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Wed, 5 Jul 2023 14:45:46 -0600 Subject: [PATCH 23/25] Update blog/2023/06/PSA-dont-use-threadid.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Mosè Giordano --- blog/2023/06/PSA-dont-use-threadid.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 0c9999102e..918abaf466 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -219,7 +219,7 @@ MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid() For instance, if a function `f` is updated to include a background `@debug` statement or other forms of non-user-visible IO, it may change from being non-yielding to yielding. If during a call to `f`, the compiler encounters a dynamic dispatch where new code must be JIT compiled, a yield-point may be encountered, and any number of other internal changes could happen to code which can cause it to yielding. - Furthermore, future versions of julia may eventually move away from a [cooperative task model](https://en.wikipedia.org/wiki/Cooperative_multitasking) to a [preemptive task model](https://en.wikipedia.org/wiki/Preemption_(computing), in which case yield points would not be the only way that race conditions like this could be encountered. + Furthermore, future versions of julia may eventually move away from a [cooperative task model](https://en.wikipedia.org/wiki/Cooperative_multitasking) to a [preemptive task model](https://en.wikipedia.org/wiki/Preemption_(computing)), in which case yield points would not be the only way that race conditions like this could be encountered. @@ \\ From debca85a5ab30ffae7ab2de9dceb95bd5e54f583 Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Thu, 6 Jul 2023 10:48:29 -0600 Subject: [PATCH 24/25] Update blog/2023/06/PSA-dont-use-threadid.md Co-authored-by: Ian Butterworth --- blog/2023/06/PSA-dont-use-threadid.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 918abaf466..70773e6b79 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -48,10 +48,14 @@ julia> f(i) = (sleep(0.001); i); julia> let state = [0], N=100 @sync for i ∈ 1:N Threads.@spawn begin - tid = Threads.threadid() # each task gets `tid = 1` - old_var = state[tid] # each task reads the current value, which for all is 0 (!) because... - new_var = old_var + f(i) # ...the `sleep` in `f` causes all tasks to pause *simultaneously* here (all loop iterations start, but do not yet finish) - state[tid] = new_var # after being released from the `sleep`, each task sets `state[1]` to `i` + tid = Threads.threadid() # Each task gets `tid = 1`. + old_var = state[tid] # Each task reads the current value, which for + # all is 0 (!) because... + new_var = old_var + f(i) # ...the `sleep` in `f` causes all tasks to pause + # *simultaneously* here (all loop iterations start, + # but do not yet finish). + state[tid] = new_var # After being released from the `sleep`, each task + # sets `state[1]` to `i`. end end sum(state), sum(1:N) From fc0960bd7a19803cbb36869b4f94e9e20a7743ea Mon Sep 17 00:00:00 2001 From: Mason Protter Date: Thu, 6 Jul 2023 10:51:27 -0600 Subject: [PATCH 25/25] Update PSA-dont-use-threadid.md --- blog/2023/06/PSA-dont-use-threadid.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/blog/2023/06/PSA-dont-use-threadid.md b/blog/2023/06/PSA-dont-use-threadid.md index 70773e6b79..25a36a3dc2 100644 --- a/blog/2023/06/PSA-dont-use-threadid.md +++ b/blog/2023/06/PSA-dont-use-threadid.md @@ -89,11 +89,13 @@ Code which replaces the incorrect code pattern shown above can look like this: using Base.Threads: nthreads, @threads, @spawn using Base.Iterators: partition -tasks_per_thread = 2 # customize this as needed. More tasks have more overhead, but better load balancing +tasks_per_thread = 2 # customize this as needed. More tasks have more overhead, but better + # load balancing chunk_size = max(1, length(some_data) ÷ (tasks_per_thread * nthreads())) -data_chunks = partition(some_data, chunk_size) # partition your data into chunks that individual tasks will deal with -# See also ChunkSplitters.jl and SplittablesBase.jl for partitioning data +data_chunks = partition(some_data, chunk_size) # partition your data into chunks that + # individual tasks will deal with +#See also ChunkSplitters.jl and SplittablesBase.jl for partitioning data tasks = map(data_chunks) do chunk # Each chunk of your data gets its own spawned task that does its own local, sequential work @@ -106,8 +108,8 @@ tasks = map(data_chunks) do chunk return state end end -states = fetch.(tasks) # get all the values returned by the individual tasks. -# fetch is type unstable, so you may optionally want to assert a specific return type. +states = fetch.(tasks) # get all the values returned by the individual tasks. fetch is type + # unstable, so you may optionally want to assert a specific return type. do_something(states) ```