-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JoinGuard::join returning an Err is **really** unsafe #20807
Comments
This is already something you have to worry about, since destructors would be able to look at foo during unwinding in the thread as well. |
These examples are different classes of bugs: the C++ code shows a critical memory safety issue, while the Rust code shows a logic error that doesn't fall within the realm of 'safety' in the sense that Rust means. How is the rust example so different from the obviously "just a bug" behavior of: fn update_calculation(&mut self) {
if self.val1 == 127 { return; };
self.calculation_result = self.val1 + self.val2 + self.val3;
} ? |
This is a really good point that I overlooked in the API. I agree with you, tomaka. |
In my example I wrote Here is a better example. This code below looks reasonable but has exception-safety related issues: fn update_calculation(&mut self) {
self.calculation_result = self.val1.checked_add(self.val2).unwrap().checked_add(self.val3).unwrap();
}
pub fn set_val1(&mut self, value: int) {
self.val1 = value;
self.update_calculation();
} If you want to "fix" this code, you would have to write: fn update_calculation(&mut self) -> Result<(), ()> {
let tmp1 = match self.val1.checked_add(self.val2) {
Some(v) => v,
None => return Err(())
};
let tmp2 = match tmp1.checked_add(self.val3) {
Some(v) => v,
None => return Err(())
};
self.calculation_result = tmp2;
Ok(())
}
pub fn set_val1(&mut self, value: int) {
let previous_value = self.val1;
self.val1 = value;
if let Err(_) = self.update_calculation() {
self.val1 = previous_value;
panic!();
}
} This is why exception safety is hard. And I'd like to emphasize that this is just a trivial case. |
@sfackler Sure, it's easy to replicate in a single thread: #![feature(unsafe_destructor)]
#[derive(Show)] struct Foo { foo: u8 }
#[unsafe_destructor]
impl<'a, 'b> Drop for Foo {
fn drop(&mut self) {
println!("Foo: {:?}", self.foo);
}
}
/// Invariant: *x = 2 after this is called.
fn foo<'a>(x: &'a mut u8) {
if *x == 0 {
panic!("oops");
}
*x = 2;
}
fn main() {
let mut x = Foo {foo: 0 };
{
foo(&mut x.foo);
}
} The difference is that you only have to worry about the exception safety violations in destructors. Destructors already have to do this if you don't abort on panic. That's fundamental to unwinding. The problem here that tomaka is pointing out is that this change causes non-destructors to have to worry about it. The other key difference with destructors is that destructors can make no guarantees about where they are called (for the most part) so they can't really rely on functions being run after the type is instantiated. In this case, the guarantee Personally I think making .join() panic unless you use |
Destructors are usually either really low-level or don't require the object to be in a valid state, so it's a far smaller problem. |
By the way, this is really sometimes a problem even for existing threads in non-destructors, unless we abort on panic. If you use atomics, for example, which have no poison value, it's very easy to get exception safety violations by not coding transactionally. So this is also an argument for abort on panic being the default (among other things) but this is a larger discussion. I think at least for now, making non-destructors worry about exception safety in cases like this is not a good idea. On another note, if people use tricks like this to make sure their or third party code can't panic, rather than because they don't want to deal with it, that is a good sign that we need a |
Sorry for taking so long to write in explicitly on this thread, though I've been mulling over the API issue for a while. I think that @tomaka and @pythonesque are right in that in general, we make it somewhat difficult (though by no means impossible) to run into exception safety problems in "normal" (especially safe) code. You have to use The current let _ = Thread::scoped(f).join();
// read some data that f had &mut access to The question is, is the I'm not really sure. I feel like the I also played with a much crazier version, but I think it's far too "clever" to be of use.
This I disagree with: if you're using atomics, you're coding transactionally almost by definition: since concurrent threads can observe the state after every atomic update, each update must uphold all needed invariants. This is why lock-free data structures are so fun to write :) |
I have never thought of After some thoughts, for me the API should ideally look like this:
After all, I can see three use cases:
I don't know if that will pass, since I think the API used to look kind of like this and since it's a bigger change than a simple modification to a safety problem. Other important problem: the destructor of
Ouch, I didn't know about this one. I'd be in favor or making I guess that this whole discussion is supposed to be in the "threads RFC" instead of an issue. Unfortunately like many people (I think) I don't have the time to read RFCs, and I spot problems once things are implemented. |
@tomaka I'm okay with the poison stuff because you really have to explicitly abandon exception safety in order to use the |
I thought you will get better traction if you demonstrate real memory unsafety in safe Rust code. The following is a paper construction, but it should work: struct Panicker<T>(T);
// Implement PartialOrd, Ord on Panicker so that it panics when compared with `<=`.
let mut heap = BinaryHeap::new();
let guard = Thread::scoped(|| {
heap.push(Panicker(Vec::new()));
heap.push(Panicker(Vec::new()));
});
guard.join().ok();
// Access heaps' first element here.
let zeroed_struct = heap.into_iter().next();
// BinaryHeap's sift_up will now panic on the first comparison, but not until after having replaced
// its first element with `zeroed()`.
// So we can create a zeroed-out version of any struct we want.
// This breaks many invariants:
// 1) A zeroed Vec means that Some(v) is the same as None, because the vec's ptr is marked NonZero.
// 2) We can zero a BTreeMap which will result in a btreemap with 0 for `b` and null for the
// root node, which will lead to an immediate violation since the insert/lookup
// methods assume root is not null. I've come back to the zeroing code in BinaryHeap before, and it's an easy target (that will eventually go away), but I'm sure there are other exploitable locations (in the sense of this bug) in libstd if we look for them. Edit: It's actually the second element of the BinaryHeap that's zeroed. |
I've been thinking about this over the weekend. My feeling is roughly that exception safety is a fact of life, but we should do what we can to minimize it. I've always been a fan of "unwind the whole thread" for this reason, and when I envisioned "data-parallel" helpers in the past -- basically a more-efficient and convenient version of Thread::scoped -- I always assumed that they would propagate panics unconditionally, just as would occur in sequential code. That said, even if we did unconditionally propagate a panic, it doesn't obviate the need for exception safety! The need for exception safety is inherent to error recovery -- the more flexible your error recovery, the more fine-grained your problem. For example, even if we did NO unwinding at all, we'd still have to worry about shared memory between processes. The problem that @bluss highlights, for example, can also occur if you have a destructor that owns the ...
struct Something {
heap: BinaryHeap<Panicker>
}
impl Drop for Something {
fn drop(&mut self) {
... self.heap.iter().next() ...
}
}
let mut heap = Something { heap: BinaryHeap::new() };
heap.push(Panicker { ... }); I guess that the main difference is that using I am sympathetic towards @aturon's suggestion (modulo naming) that we have a version of I guess that the main advantages I see to having all of these functions is that they would avoid the footgun of ignoring the result and thus help mitigate problems due to exception safety in end-user code (no guarantees are provided). Certainly libstd will still need careful vetting and fuzzing and so forth to catch these sorts of bugs, because (as I described above) the problems with exception safety still exist, even if the range of code that can expose them is less. Those advantages may be worth the price of having many distinct |
Thanks everyone for the additional comments! The basic point, which has been articulated variously in the above comments, is that exception safety is fundamentally a concern when programming in Rust. However, with careful design of things like mutex poisoning, it's a concern that can largely be relegated to unsafe implementations (like @nikomatsakis's last comment made clear to me, moreover, that the current I will mull over the cleanest/most ergonomic way of tightening up the API -- probably something along the lines of @tomaka's suggestion.
The problem is that we fundamentally cannot allow the thread to continue past that point in the stack, whether through unwinding or otherwise, because the child thread could continue accessing the data! That has clear memory safety problems. So the only other option would be to abort the process. We thought pretty hard about this join-on-drop semantics, since it is surprising, and carefully chose the name "scoped" to try to clearly communicate the lifetime concerns. |
WIP PR has been posted, addressing the above concerns together with some other API requests that have accumulated. I'd appreciate any comments on the PR. |
This commit makes several changes to `std::thread` in preparation for final stabilization: * It removes the ability to handle panics from `scoped` children; see rust-lang#20807 for discussion * It adds a `JoinHandle` structure, now returned from `spawn`, which makes it possible to join on children that do not share data from their parent's stack. The child is automatically detached when the handle is dropped, and the handle cannot be copied due to Posix semantics. * It moves all static methods from `std::thread::Thread` to free functions in `std::thread`. This was done in part because, due to the above changes, there are effectively no direct `Thread` constructors, and the static methods have tended to feel a bit awkward. * Adds an `io::Result` around the `Builder` methods `scoped` and `spawn`, making it possible to handle OS errors when creating threads. The convenience free functions entail an unwrap. * Stabilizes the entire module. Despite the fact that the API is changing somewhat here, this is part of a long period of baking and the changes are addressing all known issues prior to alpha2. If absolutely necessary, further breaking changes can be made prior to beta. Closes rust-lang#20807 [breaking-change]
Exception safety is a very well-known problem in C++.
Consider this:
If the call to
new
throws an exception, the error gets caught and the execution continue. However theFoo
object is now in an invalid state with its pointer pointing to memory that has been free'd.Before #20615 and before the change in
Send
this is not possible in Rust, because when a thread panics all objects that are local to this thread are no longer accessible. It's also the reason why mutexes are poisoned in case of a panic.But now that it's possible to send local variables to other threads, and ignore when this other thread panics, situations like this can totally happen.
Let's take this code for example:
Looks safe, right?
But what if you use that "ignore panics" trick?
At the end of this code, you have a
Foo
object in an invalid state becausecalculation_result
is not equal toval1+val2+val3
as it should be. Continuing to use the Foo object in this invalid state could lead to weird and hard to debug results (depending on the rest of the code).With the latest reforms in thread, library writer really have to take exception safety (or "panic safety") into account by avoiding the kind of code written above. You may argue that it's just the fault of the person who wrote the
Foo
struct. The problem is that exception safety is hard. Too hard to get right.In my opinion,
JoinGuard::join
should just panic if the underlying thread panicked, without leaving the choice to the user. The only situation in which atry_join
function (or another name) is safe is forJoinGuard<'static>
.cc @aturon
The text was updated successfully, but these errors were encountered: