-
Notifications
You must be signed in to change notification settings - Fork 15
Is it undefined behavior to hold an invalid reference, if it is never dereferenced. #12
Comments
I believe the second two are equivalent. The real question is, are these different: let x: &'static Foo;
let foo = Foo {...};
x = unsafe { &*(&foo as *const _) };
// x is never alive while foo is not, *but* the lifetimes are a lie I'd argue your examples are all the same actually; invalid references. References which either point to objects that don't exist, or which point to freed objects. I'm not sure if this should be UB, although I lean to yes, but I don't like accessing it while the reference is alive to change it. Also, the first three should, in my opinion, at least be the same, even if we do decide to look at accessing things for UB. |
I lean towards none of these being Undefined Behavior. I feel like in rust, we want to make sure that when people write unsafe code which they would expect to work in other languages, that code works in rust as well. Since references act a lot like pointers, it seems reasonable to assume (from the POV of an unsafe rust programmer coming from other languages like C/C++) that they act like C++ pointers - that is that invalid references are invalid, and thus cannot be followed, but that they don't trigger any UB unless they are dereferenced. If we want to make our memory model as intuitive as possible for people, I think we should make sure that this type of behavior is not UB. I think it's fairly likely that a decent number of existing crates depend on invalid references not triggering UB in order to work correctly, and it would be a shame to break them, or make them trigger UB, when we can avoid it at (what I would imagine is) almost no performance cost. I am not an expert in this area, but I'm not sure what optimizations we could gain by assuming that these references cannot exist in codegen. This also fits into the goal of making unsafe rust nicer to write than many other languages (such as C) by reducing the number of pitfalls which can screw you up and trigger UB I understand that unsafe rust will always be hard due to the need to maintain rust's invariants (which include things like not having invalid references) at the safety boundary, but allowing people to break some of these invariant without necessarily triggering UB behind the safety privacy boundary seems reasonable, and like it will make more reliable programs. Many people complain about how C has too much Undefined Behavior, which is too easy to trip over, and I don't want Rust to make the same mistake without good reasons. |
@mystor I'd argue that Rust references are far more like C++ references; |
@mystor: @ubsan: But unlike C++ references, Rust references with lifetimes greater than their referents are (or IMO should be) illegal/UB because they allow safe code to cause UB, trivially (or accidentally, simply by autoderef), and with no way for the safe code to tell that is a risk. IMO, references are something of the canonical example of "you should never even construct an invalid/mistyped/mis-lifetimed instance" - largely because of how incredibly easy it is for an invalid lifetime to leak out of |
@eternaleye I don't really agree. I feel like lifetimes shouldn't exist at all after typechecking; they should only be a lint tool, to me. Just because safe code might cause UB with what unsafe code has done doesn't mean doing it in the unsafe code should be UB. |
@ubsan: My point is that permitting the construction of references with invalid lifetimes is a huge footgun, due to how easy it is for that to pollute safe code on accident (via inference or otherwise). Meanwhile, I'm not sure there are any useful patterns for references with invalid lifetimes that are not served equally well by raw pointers - whose big distinction is that they neither have nor enforce lifetimes. Blurring the lines between them by permitting references to have invalid lifetimes, when you aren't permitted to dereference them, seems to me as if it encourages unnecessarily risky behavior on the part of the user. |
@eternaleye I have personally used them in my own code :) struct Foo {
foo: Box<Foo>,
foo_ref: &'static Foo, // reference into self
} and other similar pattern: I've gone through |
AFAIK in the current implementation there are currently only 3 restrictions on references:
Violating any of these is UB. However there are also additional restrictions we may want to apply when dereferencing a reference:
|
On the third point: lexpr to vexpr conversion ;P I'd argue, for now, that it should be UB, but I think we should postmark it for revisiting in the future. |
Just a quick remark on this: Actually, at least in C11, even creating an invalid reference is UB. From section 6.5.6 of the standard:
I am not saying that this is expected behavior, or that Rust should follow this. But generally, I think we don't want anything to act like C++ pointers, those are just way too horrible. ;-) |
This is very interesting to me. I am sympathetic to this POV, but it seems to be one of the key questions that we need to decide: "to trust types or not to trust types". I think that using That said, I think I agree with @mystor that we have to tread carefully here -- it seems obvious to me that "access-based" models are likely to be less error-prone. I am very wary of giving too much significance to types. Another related issue is the "only-access-using-volatile-loads" use case described in #16. |
One thing I think may be worthwhile is to examine how many ways an invalid reference in unsafe code could give rise to an invalid reference in safe code - I'm particularly worried about lifetime inference. If this turns out to be "not many, and easily managed" then making it UB to pass an invalid reference across an If it turns out that there are many, or that they're hard to defend against, then from an ergonomics and teachability standpoint, I think it'd be better to outlaw creating such references, even within unsafe code. The third option, allowing invalid references to exist in safe code, seems to fall into the Tootsie-Pop zone, and notably constrain the optimizer on safe code. |
@eternaleye Please define "invalid reference". I can't tell if you're talking about {
let y = 0;
let x: &'static i32 = &* (&y as *const i32);
} which should be fine, or {
let x: &'static i32 = &*(&0 as *const i32);
x
} which, I'd argue, shouldn't be. |
@ubsan let x = unsafe {
let y: &'static i32 = &*{
let z = 0;
&z as *const i32
}
} The Once an invalid reference has made it to safe code, we have two options:
My preference is to prevent invalid references from entering safe code, but if it is hard to prevent an invalid reference (once created) from reaching safe code then case 2 recurs. As a result, it would then be beneficial to make constructing it invalid, such that it's a good boundary for best-effort lints, or a sanitizer which does preserve lifetime information in the binary. |
@eternaleye that does not answer my question. I'm asking your definition of invalid reference, not for an example of one :P |
@ubsan: An invalid reference is a reference that doesn't satisfy its contract, just like an invalid instance of any other type. For references, the contract is that it points to a value of the specified type, and that the value outlives the specified lifetime. It also includes the aliasing rules, though I think those have some freedom in them (specifically, "parent" mutable references effectively don't exist when subobjects are borrowed) |
So you're saying the first example of mine should be undefined behavior? This? struct Foo {
foo: Box<u8>,
foo_ref: &'static u8,
}
fn new() -> Foo {
let b = Box::new(0);
let ref_ = unsafe { &*(&*b as *const _) };
Foo {
foo: b,
foo_ref: ref_,
}
} because I've definitely written this code; that I don't get the point of that. |
I'd say that can't possibly be anything other than UB without a Tootsie-Pop model. |
I can't think of a reasonable model where that isn't defined behavior: no invalid reference ever exists (invalid meaning "points to an invalid value or invalid memory"). No memory model that I have seen so far has taken lifetimes into account: they aren't intended to be used for codegen, just for static analysis. |
My model does consider lifetimes in safe code. IIRC old trans considered lifetimes in some situations. |
Complete newbie here, but in the context of ubsan's example it seems like any code which creates a Foo and then tries to do anything at all with the foo_ref member (or call any Foo methods that touch that member) would be immediately invoking UB, since that member has already been dropped. It seems no different from "use-after-free" to me, which I assume is considered UB in Rust. The only question in my mind is whether "making-accessible-to-safe-code-after-free" should be UB as well, but I can't think of any reason we would want to allow that. If the lifetime of foo_ref was part of new()'s signature, then it would at least be possible for clients of Foo to create it and invoke its methods in such a way that no use-after-free actually occurred. Thus I see no problem with allowing a reference to not-yet-freed memory to enter safe code. But giving safe code a reference to definitely-already-freed memory seems completely undesirable., unless you have a custom Drop impl that doesn't actually free the memory, but I have no idea what that would be good for. |
@lxrec The point (heh) is that, while the pointer |
That's not a guarantee; it's a precondition - one that needs to hold on every piece of code that accesses A guarantee is enforced; a precondition is relied on - and violating a precondition, followed by relying on it, leads to UB. Because the precondition there is not enforced by the type system, it can be violated by safe code. As a result, safe code must behave very precisely - in particular, there are behaviors allowed by the type system that it must not perform, or else cause UB. If the optimizer is unaware of exactly what those behaviors are (and that's a huge, possibly insoluble, kettle of fish in its own right), then the only solution is to apply blanket restrictions to the optimizer if such "illegal behaviors" could exist, as it may otherwise invoke such behaviors. Since such behaviors could exist if it's ever valid to pass an invalid instance of a type to safe code across an Thus, we arrive at the Tootsie-Pop model. One which I (and I believe you as well?) do not favor. |
Id love to read some explanation as to the problems with the Tootsie-Pop model. Since I read the blog post, it has always seemed like a relatively sane model for allowing breaking the rules in unsafe code, and avoiding the problems with people triggering ub when writing that unsafe code, but still allowing aggressive optimization in most of the program. |
The only explanation I could find was http://smallcultfollowing.com/babysteps/blog/2016/08/18/tootsie-pop-followup/, which if I understand correctly boils down to: Unsafe code is usually intended as an optimization, but if the optimizer is forced to be more conservative around any unsafety boundaries, then unsafe code can end up being slower than safe code; the example given was the unsafe vec.get_unchecked() being potentially slower than the safe-but-bounds-checked vec[i] operation. If there's more explanation than that, I'd also love to hear it. |
There was a (lengthy) discussion thread on internals.rust-lang.org. |
It was my understanding that there are a bunch of desirable optimizations that crucially rely on lifetimes being correct; or at least, they rely on references being more than just "valid" according to your definition. For example fn foo<F: FnOnce()>(x: &mut u32, f: F) -> u32 {
let r = *x;
f();
r
} The compiler is (in my understanding) allowed to move the dereference of Actually, I cannot see how a model would allow such transformations without taking lifetimes into account. According to your statement, a |
Indeed, this is more or less what I concluded in this recent blog post. |
I'd like to expand the scope of this topic beyond "holding an invalid reference", namely: Actually, I think this goes beyond pointers. To me, the question is: Are there cases where even just "temporarily holding but not using an invalid value inside an unsafe function" is UB? (I am restricting this to "inside a function" because I think there are some guarantees we want to uphold even for unused values when certain or all function boundaries are crossed. In fact, this is needed for some optimizations.) Most types don't matter here, as they clearly don't have any invalid values. Notable exceptions are pointers/references (this is what has been discussed here), On the one hand, if the answer to the question is "yes" -- if there are such cases where just creating/holding a bad value is UB -- then we get into the difficult game to define what does and what does not constitute a use. Store/load, conditional branching, and pattern matching clearly do (and there is, I think, no question that these are UB for "invalid" values, but who knows?). But other cases are less clear-cut. Is it insta-UB to transmute So, I'd somewhat prefer if we had as little such "invalid value UB" as possible -- ideally, none. It seems like a terrible footgun. However, I am not sure if that can be reconciled with Just dropping problems here, not proposing solutions to all of them. ;) |
@RalfJung yeah, I'm also of the opinion that we should make it as easy as possible to not commit UB, within reason. Passing and returning invalid values shouldn't be UB for example. OTOH, I don't think that invalid values should be treated specially - they should simply be treated as uninitialized. Use is UB, but there's a lot of stuff you should be able to do without it being UB. |
I don't think that's always possible... fn some_fn(x: &mut u32) {
other_unknown_fn();
*x = 13;
} It should be permitted for the compiler to move the store around the call. However, if passing a bad pointer into |
@RalfJung note that that is a use of |
(the dereference, that is) The passing itself isn't UB, but it's definitely not okay to do that to random APIs; you're breaking an API that the function has, and then the function does UB because you broke its API. |
True -- but the use is after |
@RalfJung ah, I guess that's true. Although, if you think about it, if we don't know if this function is |
@RalfJung perhaps passing as a parameter could be counted as a use, though? I'm not sure. If you have |
Good point! |
I think we made the decision that all of these are "instant" UB. The moment you have an invalid rvalue, you have UB. But then there's the little reference problem, which we should probably solve at the typeck level by allowing |
@arielb1 I've gone back on that idea. It's just... too hard to work with, imo. edit: nvm, arielby convinced me |
First, at least for Also, what's the problem with that idea? |
@arielb1 You've convinced me back on irc. I'm back to "creating an uninitialized |
The main reason I want "no undef values ever" is because, well, see the LLVM long thread about undef. I'll like to keep this within the optimizer. |
When refering to IRC please also link to relevant logs or copy them over.
Thanks.
On Jun 25, 2017 11:09 AM, "Nicole Mazzuca" <[email protected]> wrote:
@arielb1 <https://github.com/arielb1> You've convinced me back on irc. I'm
back to "creating an uninitialized rvalue is UB".
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AApc0uuAwED1mQPCjStkVhVVkGYJEd5Iks5sHhWzgaJpZM4JoGSH>
.
|
It boils down to "it's already UB", basically.
|
This is now becoming the validity invariant discussion at rust-lang/unsafe-code-guidelines#76 and rust-lang/unsafe-code-guidelines#77. |
It is not Safe in rust code to create invalid references, in that these references can be used by safe code to trigger Undefined Behavior, however, is it Undefined Behavior to create one of these references?
For example, does this code trigger Undefined Behavior?
How about this code, which gets the address of the member, but does no reading?
How about this code, which creates an invalid reference by over-extending the lifetime, but does not follow it?
Or this code, which creates a reference with the wrong lifetime, but only follows it while the object behind it is still alive?
The text was updated successfully, but these errors were encountered: