-
-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New blogpost: PSA: Thread-local state is no longer recommended; Common misconceptions about threadid()
and nthreads()
#1904
Conversation
states[threadid()]
|
||
MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid()]`-like pattern safer by using lock mechanisms to stop data races. We think this is not an ideal way to proceed, but it may make transitioning to safer code easier and require fewer rewrites, such as in cases where a package must manage state which may be concurrently written to by a user, and the package cannot control how the user structures their code. | ||
|
||
[^historynote]: ### Concurrency & Parallelism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a big footnote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally added to the main body, but I moved it to a footnote because I wanted the main thrust of the article to be more focused, but I also couldn't bring myself to delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am torn about it as well. I originally just wanted to define the terms, so that everyone understood them, but then it grew into this historical perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it an Appendix section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference?
Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Sukera <[email protected]>
Co-authored-by: Sukera <[email protected]>
Co-authored-by: Sukera <[email protected]>
Co-authored-by: Sukera <[email protected]>
Alt titles: | ||
- PSA: Multithreading with `states[threadid()]` is unsafe | ||
- PSA: Multithreading with `buffers[threadid()]` is unsafe | ||
- PSA: Don't assume `threadid()` is stable within a task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just this?
- PSA: Don't assume `threadid()` is stable within a task | |
- PSA: Don't assume `threadid()` is stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this title rather misleadning, because the stability of threadid()
is not the actual problem here. The problem is race conditions in updating the buffer at a specific threadid()
.
E.g. in the single threaded example in the blogpost, the threadid()
is perfectly stable, it's always 1
, but the problem persists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A technically correct (but horrible UX) title could be
PSA: Don't use non-threadsafe, volatile, shared state to attempt to coordinate concurrent task execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want is to catch the attention of people who are only dimly aware of this stuff and what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You are doing it wrong! Common misconceptions about threadid() and nthreads()"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last part of that is good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From @vchuravy's comment below, with a slight tweak:
PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I've changed the title-in-progress to that one. I'm still not really convinced it's any better since people don't really know what that means, but I also don't have strong feelings about it and the old title was also not perfect.
This blogpost should probably acknowledge the previous blog post teaching this pattern. It would also be good to include a commit in this PR adding a disclaimer to that same blog post. |
Oh my! That didn't age very well. This gives a natural title: "We no longer recommend thread-local state; Common misconceptions about threadid() and nthreads()" |
do_something(states) | ||
``` | ||
|
||
The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that be clearer?
The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. | |
The above code is **incorrect** because the tasks spawned by `@threads` are neither guaranteed to have a threadid exclusively during their lifetime nor to keep the same threadid during their execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think the "have a threadid exclusively during their lifetime" part is a bit easy to misunderstand. I think the sentence you're wanting to replace here should not be seen as an explanation, but as a definition at the start of an explanation, which is in the following couple of sentences.
I think it's useful to motivate or define here what "yielding" is
Co-authored-by: Klaus Crusius <[email protected]>
Co-Authored-By: Mason Protter <[email protected]> Co-Authored-By: Thibaut Lienart <[email protected]>
Formatting fixed and pushed here, thanks to @tlienart https://julialang.netlify.app/previews/pr1914/blog/2023/06/psa-dont-use-threadid/ |
Also the two docs PRs called out up top (JuliaLang/julia#48542 JuliaLang/julia#50047) are due to land in 1.9.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An excellent and much-needed post, thanks!
do_something(states) | ||
``` | ||
|
||
The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution. This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For developers who may be less familiar with race conditions, would it be easier to delay explaining what can go wrong until the f(i) = (sleep(0.001); i)
example below? I'm imagining you could do something like this:
tid = Threads.threadid() # each task gets `tid = 1`
old_var = state[tid] # each task reads the current value, which for all is 0 (!) because...
new_var = old_var + f(i) # ...the `sleep` in `f` causes all tasks to pause *simultaneously* here (all loop iterations start, but do not yet finish)
state[tid] = new_var # after being released from the `sleep`, each task sets `state[1]` to 1
My hope is that the comments walk readers through the logic more clearly than the short description above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style note: for readability's sake it might be better to put # 1, # 2, # 3, ... and then put the comments in a numbered list below the code as otherwise this will y-overflow on devices and it's a bit awkward to read code having to scroll horizontally over it 🤷
edit: hmm looks like I didn't do a good job at commenting the changes, I was referring to d3b24f2
1.9.2 is imminent and the docs are already live https://docs.julialang.org/en/v1/manual/multi-threading/#man-task-migration https://docs.julialang.org/en/v1/manual/multi-threading/#Using-@threads-without-data-races |
So I tried a couple things to get the comments into a list like @tlienart suggested but I wasn't really happy with any of them and personally find it not so bad to scroll horizontally if the upside is that the comment is right beside the code. Like, if the comments aren't going to be in-line with the code then I'm not sure I really see the point because we already have a paragraph explaining what's going on anyways. Does anyone have suggestions on how to improve the comments in the code for more legibility? I'd really like to get this published. |
Maybe lets get a new preview built so that we can see what it looks like? |
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
Okay, I think this is ready to merge! |
I do not understand the logic in this post (https://julialang.org/blog/2023/07/PSA-dont-use-threadid/).
Could you please give an easily reproducible sample using the |
Could you clarify which part your thinking of when you say that the example is not reproducible? |
In the section "The buggy pattern" in this Blog (https://julialang.org/blog/2023/07/PSA-dont-use-threadid/) it reads "The general template that this incorrect code follows is something like the following:" (then the The following Julia code using the general template (as mentioned in the Blog) is repeatedly tested via
For the
Thus, I mean the general template in the Blog is not reproducible. The
|
@xwuupb please, if you have any questions ask them on Discourse https://discourse.julialang.org/, a long merged pull request isn't a good place to have this discussions. |
Now, the topic is moved here: https://discourse.julialang.org/t/psa-thread-local-state-is-no-longer-recommended-common-misconceptions-about-threadid-and-nthreads/101274/54 |
This is a blogpost some of us have been working on initially on hackmd, spurred by the somewhat surprising discovery of just how many packages out in the wild are using this dangerous pattern.
I feel myself being strongly pulled towards expanding this into a rather long form discussion of multithreading, and also wanting to wait for PRs to replace
@threads
with apmapreduce
or whatever, but I think the most important short term thing is to keep this blog post somewhat short, and focused on the specific problem of people indexing into storage usingthreadid()
(or other common invalid usages ofthreadid()
), and offering fixes without lowering the signal to noise ratio too much -- we need people to actually read and understand this!Feedback, fixes, and bikeshedding all appreciated and welcome.
TODO's before publication
threadid()
e.g.add docs on task-specific buffering using @threads
#48542add docs on task migration
#50047!!1compat
notes to threadid/nthreads/maxthreadid@async