-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast path for std::thread::panicking
.
#72617
Conversation
This is done by adding a global atomic variable (non-TLS) that counts how many threads are panicking. In order to check if the current thread is panicking, this variable is read and, if it is zero, no thread (including the one where `panicking` is being called) is panicking and `panicking` can return `false` immediately without needing to access TLS. If the global counter is not zero, the local counter is accessed from TLS to check if the current thread is panicking.
(rust_highfive has picked a reviewer for you, use r? to override) |
r? @Amanieu |
LOCAL_PANIC_COUNT.with(|c| c.get()) | ||
} | ||
|
||
pub fn is_zero() -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be #[inline]
, and a separate #[cold]
function should be used for the slow path that is not inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the std::panicking::panicking
function be also made #[inline]
? Otherwise I'm not sure there will be much benefit.
The current call stack is:
std::thread::panicking
, which is#[inline]
std::panicking::panicking
, which is not#[inline]
std::panicking::panic_count::is_zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that should probably be inline too.
…path into a separate cold function.
|
||
// Slow path is in a separate function to reduce the amount of code | ||
// inlined from `is_zero`. | ||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cold]
should be enough, as a general rule we avoid using both cold and inline(never)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that way because that is how it is done here:
Lines 1262 to 1267 in 0b66a89
#[inline(never)] | |
#[cold] | |
#[track_caller] | |
fn expect_failed(msg: &str) -> ! { | |
panic!("{}", msg) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I guess it doesn't really matter in practice.
@bors r+ |
📌 Commit 771a1d8 has been approved by |
@bors rollup |
Add a fast path for `std::thread::panicking`. This is done by adding a global atomic variable (non-TLS) that counts how many threads are panicking. In order to check if the current thread is panicking, this variable is read and, if it is zero, no thread (including the one where `panicking` is being called) is panicking and `panicking` can return `false` immediately without needing to access TLS. If the global counter is not zero, the local counter is accessed from TLS to check if the current thread is panicking.
GLOBAL_PANIC_COUNT.fetch_sub(1, Ordering::Relaxed); | ||
LOCAL_PANIC_COUNT.with(|c| { | ||
let next = c.get() - 1; | ||
c.set(next); | ||
next | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GLOBAL_PANIC_COUNT.fetch_sub(1, Ordering::Relaxed); | |
LOCAL_PANIC_COUNT.with(|c| { | |
let next = c.get() - 1; | |
c.set(next); | |
next | |
}) | |
let next = LOCAL_PANIC_COUNT.with(|c| { | |
let next = c.get() - 1; | |
c.set(next); | |
next | |
}); | |
GLOBAL_PANIC_COUNT.fetch_sub(1, Ordering::Relaxed); | |
next |
LocalKey::with
can itself panic:
rust/src/libstd/thread/local.rs
Lines 235 to 243 in 229e5b2
pub fn with<F, R>(&'static self, f: F) -> R | |
where | |
F: FnOnce(&T) -> R, | |
{ | |
self.try_with(f).expect( | |
"cannot access a Thread Local Storage value \ | |
during or after destruction", | |
) | |
} |
This will likely result in a double panic and thus process abort, but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK LocalKey::with
can only panic if the type has a destructor (when accessing a TLS value that has already been dropped). This isn't the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like it is worth a comment in the code.
Add a fast path for `std::thread::panicking`. This is done by adding a global atomic variable (non-TLS) that counts how many threads are panicking. In order to check if the current thread is panicking, this variable is read and, if it is zero, no thread (including the one where `panicking` is being called) is panicking and `panicking` can return `false` immediately without needing to access TLS. If the global counter is not zero, the local counter is accessed from TLS to check if the current thread is panicking.
Add a fast path for `std::thread::panicking`. This is done by adding a global atomic variable (non-TLS) that counts how many threads are panicking. In order to check if the current thread is panicking, this variable is read and, if it is zero, no thread (including the one where `panicking` is being called) is panicking and `panicking` can return `false` immediately without needing to access TLS. If the global counter is not zero, the local counter is accessed from TLS to check if the current thread is panicking.
…arth Rollup of 14 pull requests Successful merges: - rust-lang#72617 (Add a fast path for `std::thread::panicking`.) - rust-lang#72738 (Self contained linking option) - rust-lang#72770 (Implement mixed script confusable lint.) - rust-lang#73418 (Add unstable `core::mem::variant_count` intrinsic) - rust-lang#73460 (Emit line info for generator variants) - rust-lang#73534 (Provide suggestions for some moved value errors) - rust-lang#73538 (make commented examples use valid syntax, and be more consistent ) - rust-lang#73581 (Create 0766 error code) - rust-lang#73619 (Document the mod keyword) - rust-lang#73621 (Document the mut keyword) - rust-lang#73648 (Document the return keyword) - rust-lang#73673 (Fix ptr doc warnings.) - rust-lang#73674 (Tweak binop errors) - rust-lang#73687 (Clean up E0701 explanation) Failed merges: - rust-lang#73708 (Explain move errors that occur due to method calls involving `self` (take two)) r? @ghost
// a fast path in `is_zero` (which is used by `panicking`). Access to | ||
// this variable can be always be done with relaxed ordering because | ||
// it is always guaranteed that, if `GLOBAL_PANIC_COUNT` is zero, | ||
// `LOCAL_PANIC_COUNT` will be zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
GLOBAL_PANIC_COUNT
is zero,LOCAL_PANIC_COUNT
will be zero.
This is a very strong statement to make, in particular about relaxed atomics! It took me some time to actually follow the argument here. You cannot, in general, relate the content of two different locations when you use relaxed accesses. In particular, "GLOBAL_PANIC_COUNT
is zero" is an odd statement as atomics can appear to have different values in different threads. The only reason it works here is that one of the two is thread-local. I was also confused at first because decrease
decrements GLOBAL
first, so there is a time when GLOBAL
is already zero for the current thread but LOCAL
is still 1.
So what about
In any particular thread, if that thread currently views
GLOBAL_PANIC_COUNT
as being zero, thenLOCAL_PANIC_COUNT
in that thread is zero. This invariant holds before and afterincrease
anddecrease
, but not necessarily during their execution.
Also, it would be good to explain in a comment why accessing an atomic global (potentially subject to contention) is faster than accessing a non-atomic thread local. That is rather non-intuitive IMO -- I would have expected the exact opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting a TLS pointer requires calling __tls_get_addr
for ELF binaries when using the GD TLS model. This is not inlinable and may walk a linked list.
https://github.com/bminor/glibc/blob/5f72f9800b250410cad3abfeeb09469ef12b2438/elf/dl-tls.c#L821
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any particular thread, if that thread currently views
GLOBAL_PANIC_COUNT
as being zero, thenLOCAL_PANIC_COUNT
in that thread is zero. This invariant holds before and afterincrease
anddecrease
, but not necessarily during their execution.
Yes, that would have been a better description. I wrote the comment thinking about the is_zero
function, which is the only place where the value of GLOBAL_PANIC_COUNT
matters.
Also, it would be good to explain in a comment why accessing an atomic global (potentially subject to contention) is faster than accessing a non-atomic thread local. That is rather non-intuitive IMO -- I would have expected the exact opposite.
You can compare here the assembly of a relaxed atomic load and a TLS read. The atomic load just needs a normal mov
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can compare here the assembly of a relaxed atomic load and a TLS read. The atomic load just needs a normal mov.
Assembly is basically gibberish to me.^^ But that's okay, a high-level explanation like @bjorn3 gave is totally sufficient. :)
So could you open a PR to include these comments, or should I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So could you open a PR to include these comments, or should I do that?
I'll do it now.
… r=RalfJung Improve comments from rust-lang#72617, as suggested by RalfJung r? @RalfJung
…arth Rollup of 12 pull requests Successful merges: - rust-lang#73140 (Fallback to xml.etree.ElementTree) - rust-lang#73670 (Add `format_args_capture` feature) - rust-lang#73693 (Use exhaustive match in const_prop.rs) - rust-lang#73845 (Use &raw in A|Rc::as_ptr) - rust-lang#73861 (Create E0768) - rust-lang#73881 (Standardize bibliographic citations in rustc API docs) - rust-lang#73925 (Improve comments from rust-lang#72617, as suggested by RalfJung) - rust-lang#73949 ([mir-opt] Fix mis-optimization and other issues with the SimplifyArmIdentity pass) - rust-lang#73984 (Edit docs for rustc_data_structures::graph::scc) - rust-lang#73985 (Fix "getting started" link) - rust-lang#73997 (fix typo) - rust-lang#73999 (Bump mingw-check CI image from Ubuntu 16.04 to 18.04.) Failed merges: - rust-lang#74000 (add `lazy_normalization_consts` feature gate) r? @ghost
This is done by adding a global atomic variable (non-TLS) that counts how many threads are panicking. In order to check if the current thread is panicking, this variable is read and, if it is zero, no thread (including the one where
panicking
is being called) is panicking andpanicking
can returnfalse
immediately without needing to access TLS. If the global counter is not zero, the local counter is accessed from TLS to check if the current thread is panicking.