Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Direct and Partial Initialization using &uninit T #2534

Closed
wants to merge 14 commits into from

Conversation

RustyYato
Copy link

@RustyYato RustyYato commented Aug 29, 2018

Use &uninit T to initialize your memory directly!

This can have performance improvements and makes dealing with uninitialized memory completely safe.

Rendered


this RFC has been edited This RFC used to also have &out, but I removed them due to insufficient motivation, and a relatively easy workaround to &out.

@RustyYato
Copy link
Author

RustyYato commented Aug 29, 2018

Earlier conversation can be found in the Pre-RFC

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 29, 2018
@Centril Centril changed the title Initial Commit for Write Pointer RFC: Partial Initialization and Write Pointers using &out T and &uninit T Aug 29, 2018
@newpavlov
Copy link
Contributor

newpavlov commented Aug 29, 2018

I think it's better to write a separate RFC for &out T and use this one only for &uninit T.

EDIT: &mut T -> &out T

@RustyYato RustyYato changed the title RFC: Partial Initialization and Write Pointers using &out T and &uninit T RFC: Write Pointers for Direct and Partial Initialization using &out T and &uninit T Aug 29, 2018
@RustyYato
Copy link
Author

Did you mean &out T? I would like to keep &out T and &uninit T together under the banner of write pointers because they overlap in many ways, and have only a few notable differences. Therefore, I think it is better to talk about them in one RFC.

@RustyYato RustyYato changed the title RFC: Write Pointers for Direct and Partial Initialization using &out T and &uninit T RFC: Write References for Direct and Partial Initialization using &out T and &uninit T Aug 29, 2018
@cramertj
Copy link
Member

cramertj commented Aug 29, 2018

Can you add some discussion of the advantages of &uninit and &out over &mut MaybeUninit<T> (which provides a safe set method)? (WIP PR for MaybeUninit)

@RustyYato
Copy link
Author

RustyYato commented Aug 30, 2018

@cramertj With a pointer you could write to fields of fields in a write only way, for example:

This example is broken, see my comment below for the udpated version

#[derive(Clone, Copy)]
pub struct Outer {
    pub public_field: u32
    private_field: u32
}

impl Outer {
    pub fn new(private_field: u32) -> Self {
        Outer { public_field: 0, private_field }
    }
}

// The code below lives in a different crate

// relevant imports here

let outer = Outer::new(20);

foo(&out outer);

let outer = MaybeUninit::new(Outer::new(30));

bar(&mut outer);

fn foo(outer: &out Outer) {
    // println!("{}", outer.should_only_write); // Compile time error

    outer.should_only_write = 20; // Fine, because we only write
}

fn bar(outer: &mut MaybeUninit<Outer>) {
    outer.set(
        // what to put in here?
    )
}

Note bar cannot be written without unsafe code and knowning the layout of Outer, which is not stable.

This example had a few problems, see my below for the updated version.

@RustyYato
Copy link
Author

RustyYato commented Aug 30, 2018

Also, &out T and &uninit T are unsafe free, whereas MaybeUninit fundamentally cannot be unsafe free. At least without using more space, (through an enum instead of a union, but that defeats the purpose).
edit: after looking through the proposal for MaybeUninit, you can still read the values inside, but only within a unsafe, and writing is safe.

@RustyYato
Copy link
Author

&uninit T can be safely read after it is written to. MaybeUninit cannot handle this case at all without unsafe. MaybeUninit<T> seems to be very similar to &out T, two different ways to do the same thing.

@cramertj
Copy link
Member

@KrishnaSannasi Reading values from a MaybeUninit is safe after setting it, since the set method (or another method with a similar name) returns an &mut reference to the inner value. SImilarly, since MaybeUninit is repr(transparent), it can be used to safely refer to uninitialized elements inside of an array.

@RustyYato
Copy link
Author

RustyYato commented Aug 30, 2018

@cramertj How about values on the heap? (Such as in a Vec<T>, if we want to implement something like emplace_back)
I'm not too familiar with #[repr(transparent)], so I don't exactly know how it works. Thanks for your help.

@cramertj
Copy link
Member

The process works no differently on the heap, but the specific API for placement-new inside of Vec<T> is more interesting because the compiler needs to know for sure that the value has been initialized before it can move on to other operations.

#[repr(transparent)] means "this type's runtime and ABI representation is exactly the same as that of it's only non-zero-sized field." e.g. #[repr(transparent)] struct Foo { bar: Bar } has exactly the same in-memory representation as Bar.

@RustyYato
Copy link
Author

A Vec<T> emplace function using &uninit T would look something like:

impl<T> Vec<T> {
    fn emplace_back(put: impl FnOnce(&uninit T)) {
        // ... allocate magic ...
        put(&uninit end);
    }
}

and we could have some-placement-new like sugar for ease of use, but that's not necessary.

@cramertj
Copy link
Member

@KrishnaSannasi Yeah, that works out nicely. A similar thing using MaybeUninit could look like this:

pub struct Uninit<'a, T, Tag> {
    location: &'a mut MaybeUninit<T>,
    marker: PhantomData<Tag>,
}

pub struct UninitFilled<Tag>(PhantomData<Tag>);

impl<'a, T, Tag> Uninit<'a, T, Tag> {
    pub fn new(location: &'a mut MaybeUninit<T>) -> Self {
        ...
    }

    #[inline(always)]
    pub fn fill(self, value: T) -> UninitFilled<UniqToken> {
        self.location.set(value);
        UninitFilled(PhantomData)
    }
}

impl<T> Vec<T> {
    pub fn emplace_back(put: impl for<Tag> FnOnce(Uninit<'_, T, Tag>) -> UninitFilled<Tag>) {
        ...
    }
}

The Place and PlaceFilled definitions could exist in the standard library. I personally like this solution because it avoids introducing additional language constructs (new reference types for the compiler and users to understand), but it does have the unfortunate downside of requiring type-level HRTB (for<Tag> ...) in order to prevent users manufacturing their own Place with Place::new and making a PlaceFilled that they then hand back directly rather than actually calling set.

@RustyYato
Copy link
Author

RustyYato commented Aug 30, 2018

@cramertj Cool, why do we need Tag?

Also, how about this:
You have a public and private field(s), the public field should only be written to in the relevant functions, otherwise bad things happen (like segfault).

#[derive(Clone, Copy)]
pub struct Outer {
    pub public_field: u32
    private_field: u32
}

impl Outer {
    pub fn new(private_field: u32) -> Self {
        Outer {
            public_field: 0,
            private_field
        }
    }
}

#[repr(transparent)]
struct MaybeUninitWrite<'a, T: 'a>(&mut MaybeUninit<T>);

impl<'a, T: 'a> MaybeUninitWrite<'a, T> {
    pub fn new(inner: &'a mut MaybeUninit) -> Self {
        MaybeUninitWrite(inner)
    }

    pub fn set(&mut self, value: T) {
        (self.0).set(value);
    }
}
// The code below lives in a different crate

// relevant imports here

let outer = Outer::new(20);

foo(&out outer);

let mut outer = MaybeUninit::new(Outer::new(30));
let outer = MaybeUninitWrite::new(&mut outer);

bar(&mut outer);

let mut outer = MaybeUninit::new(Outer::new(40));

unsafe { baz(&mut outer); }

// These functions should not be able to read any value in Outer. It could cause a segfault. 
// (obviously this code won't but we can imagine code very similiar to this
// that might actually segfault, using hw interfaces managed by the OS)

// This enforces that contract statically
fn foo(outer: &out Outer) {
    // println!("{}", outer.public_field); // Compile time error

    outer.public_field = 20; // Fine, because we only write
}

// This cannot do the same thing as foo
fn bar(outer: &mut MaybeUninitWrite<Outer>) {
    // not possible to change public_field on outer
}

// This cannot enforce a write-only contract,
// and might segfualt if outer holds an uninitialized value
unsafe fn baz(outer: &mut MaybeUninit<Outer>) {
    outer.get_mut().public_field = 20;
}

Doing something this simple, should not require unsafe code, or complex type usage.

should raw pointers, *out T be added
should a new trait, OverwriteSafe, be used for the bounds of &out T
@RustyYato
Copy link
Author

Another idea for the bounds of &out T, would be an auto-implemented trait OverwriteSafe (name not important), that is implemented for all types, that T: !Drop and all fields implement T: OverwriteSafe. This would prevent nested Drop types from being inside structs or enum. Thereby allowing it to be safely overwritten.

The downside to this approach is that we add a new trait to the standard library, and adding a Drop implementation becomes a breaking change. This is probably undesirable. But, adding a Drop implementation later doesn't seem likely, so this might be fine.

@bjorn3
Copy link
Member

bjorn3 commented Sep 5, 2018

[…] but it does have the unfortunate downside of requiring type-level HRTB (for ..) […]

Why not use a lifetime, instead of a type as tag?

@cramertj
Copy link
Member

cramertj commented Sep 5, 2018

@bjorn3 It's possible that an invariant lifetime would be sufficient (so that the user couldn't create a 'static version or another longer-lived version that could be used instead).

@YaLTeR
Copy link

YaLTeR commented Sep 6, 2018

In the examples with vector emplace_back() returning an &uninit T what happens if you do nothing and just mem::forget() it? Wouldn't the vector try to drop this uninitialized value in its destructor? The same applies to emplace_back() taking a closure.

@RustyYato
Copy link
Author

RustyYato commented Sep 7, 2018

@YaLTeR You would be forgetting the pointer, not the underlying value. In the closure case, you would not be initializing the value, so it would break this rule and get a compiler error

  • Functions and closures that take a &uninit T argument must initialize it before returning

and you cannot return a &uninit T by that same rule's sub-rule and would get a compiler error

  • You cannot return an &uninit T

I put this rule in for the exact reason you pointed out. If we create an uninitialized value in a Vec, then try and drop the Vec, that would cause many problems, and I don't want to enforce a rule of &uninit T must be initialized if you get one from the return value of a function, although that is a viable alternative.

That way, only &uninit T that you create may be allowed to be left uninitialized. This would also allow functions to leave &uninit T uninitialized, if they return that &uninit T. But it seems more complicated than it's worth.

@cramertj

... invariant lifetime would be sufficient ...

How would you define an invariant lifetime, using references will always give a covariant lifetime, and there doesn't seem to be another way to constrain lifetimes, if I'm understanding the nomicon section of variance correctly.

@ExpHP
Copy link

ExpHP commented Sep 9, 2018

In the closure case, you would not be initializing the value, so it would break this rule and get a compiler error

This assumes the compiler can identify that mem::forget doesn't initialize it, without using global analysis.

...and this is certainly something it can do; any function with the signature fn foo<T>(x: T) couldn't possibly initialize an &uninit by argument from parametricity. But this should lead to a more general restriction; e.g. any generic parameter T can never resolve to &uninit _ as the outermost type constructor, or something like that.

@richard-uk1
Copy link

Would this help to solve the problem where only part of a struct has defaults? e.g.

struct Person {
    given_name: String, // no default
    family_name: String, // no default
    other_names: Vec<String>, // default = Vec::new()
    email: Option<String>, // default = None
    //...
}

There could be some function that returns an &uninit Person where the compiler checks that you initialize the remaining values.

Would this be possible?

@RustyYato
Copy link
Author

@ExpHP Yes, that sounds right.

@derekdreery That would require global knowledge of the entire program in the worst case, and that is undesirable, so no, this will not solve that problem. This is fine for the compiler to do, but the human programming would find this confusing, and hard to reason about. This also makes changing the implementation of a function a breaking change, which is also undesirable.

@Centril Centril added A-typesystem Type system related proposals & ideas A-expressions Term language related proposals & ideas A-syntax Syntax related proposals & ideas I-nominated labels Nov 22, 2018
@nikomatsakis
Copy link
Contributor

@rfcbot fcp postpone

Per my previous comment, I am moving to postpone this RFC. My reasoning remains unchanged:

I definitely feel like there are real use cases here, but I am very wary of growing the number of reference types, especially in a sort of "narrow, targeted" fashion.

(Per that same comment, we are still working through the NLL and Polonius transition and will be for some time.)

@rfcbot
Copy link
Collaborator

rfcbot commented Dec 20, 2018

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Dec 20, 2018
# Unresolved questions
[unresolved-questions]: #unresolved-questions

- What is the correct bounds for `T` in `&out T`? conservatively, `T: Copy` works and is sound. But as [@TechnoMancer](https://internals.rust-lang.org/u/TechnoMancer) pointed out, `T: !Drop` is not the correct bound. For example, `Wrapper(Vec<String>)`, clearly cannot be overwritten safely, because `Vec<String>`, must drop do deallocate the `Vec`, but `Wrapper` itself does not implement drop. Therefore either a new trait is needed (but unwanted), or we must keep the `Copy` bound.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing to call destructors is safe; better find another word. This is mostly a technicality, however; resource leaks are still something Rust would rather avoid, and I agree not accounting for this here at all would make them too easy.

But besides this, I've been long thinking that Drop needs an overhaul. I'd replace it with something which looks like this:

  • trait Drop { /* ... */ }: same interface as before
  • unsafe trait Destruct: Drop {}: marker trait, assures that values of a given type can be disposed of by merely calling Drop implementations of their constituent fields; the type has no disposal logic of its own and can be safely destructured (this corresponds to what is now informally expressed as !Drop)
  • unsafe auto trait Forget: Destruct {}: marker trait, assures that all Drop implementations of transitively constituent fields are no-ops; in other words, mem::drop on the type is completely equivalent to mem::forget.

Now, automatic deriving behaviour:

  • for any given type T, unless the containing module provides an explicit Drop implementation, the compiler automatically derives a no-op Drop and Destruct for T;
  • if all constituent fields are also Forget, the compiler derives Forget as well.

This system would remove a major motivation for introducing negative traits/bounds (which have been pointed out to be rather problematic, especially in the context of automatic impls for !), as it would create a way to formalise 'trivial disposal behaviour' in the type system without them; in this case, the correct bound would be Forget. It would also open the doors to adding true linear typing to Rust: those could simply explicitly opt-out of implementing Drop. In other words, lacking an impl Drop would no longer mean 'trivial disposal logic', but 'no automatic disposal at all'.

However, I'm not sure how backwards compatible this would be. Obviously, introducing true linear typing would incur some major changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Drop overhaul sounds nice on paper, I would still still want some form of negative trait impls, because there are some APIs to need this to be work at all. But that is beside the point. The Drop overhaul doesn't fit in with this RFC, and you could post it to internals if you want more feedback.

I did note that failing to call destructors is safe as the very next sub bullet. This is an unresolved question because I wasn't sure if we wanted to allow leaks this way.

@graydon
Copy link

graydon commented Jan 12, 2019

Strong oppose. Cognitive load and risk of writing wrong code is high, motivation is marginal.

@Centril
Copy link
Contributor

Centril commented Jan 12, 2019

Cognitive load and risk of writing wrong code is high, motivation is marginal.

@graydon Could you perhaps elaborate with examples of where it could go wrong and examples of the risks?

@RustyYato
Copy link
Author

RustyYato commented Jan 12, 2019

@graydon I'm currently in the middle of stripping down this proposal. No more &out as it is completely unnecessary (it can be written as a library item based off of &mut). I am also thinking about how to rework uninit.

I can see how it can be a large cognitive load, but dealing with uninitialized memory is hard and so the rules around &uninit are there to make sure nothing can go wrong. The compiler will be the one to check your code so you shouldn't have to think too hard about it. It's like the orphan rules, they can be somewhat complicated, but most of the time you don't need to think about it.


&out as a library item, for those who are curious

struct Out<'a, T> { ptr: &'a mut T }

impl<'a, T> Out<'a, T> {
    // to set without dropping old value
    fn set(&mut self, value: T) {  unsafe { std::ptr::write(self.ptr, value); } }
    fn reborrow<'b>(out: &mut Self) where 'a: 'b { Out { ptr: out.ptr } }
}

impl<'a, T> From<&'a mut T> for Out<'a, T> {
    fn from(ptr: &'a mut T) -> Self {
        Self { ptr }
    }
}

edit: I know this isn't exactly what I put out for &out, but it does cover most of the use case, so I am fine with it.

@RustyYato RustyYato changed the title RFC: Write References for Direct and Partial Initialization using &out T and &uninit T RFC: Direct and Partial Initialization using &uninit T Jan 12, 2019
@graydon
Copy link

graydon commented Jan 12, 2019

Cognitive load and risk of writing wrong code is high, motivation is marginal.

@graydon Could you perhaps elaborate with examples of where it could go wrong and examples of the risks?

Users obtaining pointers to uninitialized memory casually because "it's faster", which they then convert to a raw pointer. Presently casually taking a ref to an uninit variable gets you an error; you have to work somewhat to get yourself some uninit memory. We worked hard to paint uninitialized memory into a very small corner in this language you have to do some ceremony to get at. Now it'll be a chapter heading in a book that's very easy to see, learn, get interested in, tinker with, use in your own libraries.

Additional unclear risk: drop semantics. Drops execute top-down, so this has to not work on drop types. The proposal isn't entirely clear if it's going to try to allow that.

Cognitive load I think is obvious: it's a whole new qualifier with a complex set of rules that people will naturally sprinkle all over codebases, especially if it's associated with speed. We worked super hard to minimize the number of qualifiers. It looks like it can all be done with a library type, so it should be.

It's also a new state of initialized-ness for users to think about, for every memory cell. Currently the user's mental model does not include (unless they're off in unsafe land) the initialized-ness of the substructure of a struct. Now it will.

@SoniEx2
Copy link

SoniEx2 commented Jan 12, 2019

PITs are supposed to put stricter requirements on this thing. a pointer to a PIT is valid, but is a pointer to a PIT, not to a full object, because PITs encode initialization state in the type. you can still access uninitialized or invalid memory by going through usize but we have that issue with many existing types anyway.

@RustyYato
Copy link
Author

@graydon

Users obtaining pointers to uninitialized memory casually because "it's faster", which they then convert to a raw pointer. Presently casually taking a ref to an uninit variable gets you an error; you have to work somewhat to get yourself some uninit memory. We worked hard to paint uninitialized memory into a very small corner in this language you have to do some ceremony to get at. Now it'll be a chapter heading in a book that's very easy to see, learn, get interested in, tinker with, use in your own libraries.

I am not convinced by this argument, to do anything with with raw pointers you need unsafe, and if you are already using unsafe it is really easy to get a raw pointer to uninit memory.

let x: String = unsafe { std::mem::unintialized() };

let y: *mut _ = &mut x; // raw pointer to uninit memory

But now that you have a raw pointer to uninit memory you need to do pretty much all the same work that I have laid out to make sure that you don't do anything unsound with it. This is cognitive load that could be shifted onto the compiler to have it validate our program. Which is what this proposal is trying to do

Additional unclear risk: drop semantics. Drops execute top-down, so this has to not work on drop types. The proposal isn't entirely clear if it's going to try to allow that.

Currently this will be allowed, because it logically identical to storing the data in a bunch of temporaries, then constructing the Drop type. Once the final field is initialized, the Drop type is initialized and will be dropped at the end of the scope.

Cognitive load I think is obvious: it's a whole new qualifier with a complex set of rules that people will naturally sprinkle all over codebases, especially if it's associated with speed. We worked super hard to minimize the number of qualifiers. It looks like it can all be done with a library type, so it should be.
It's also a new state of initialized-ness for users to think about, for every memory cell. Currently the user's mental model does not include (unless they're off in unsafe land) the initialized-ness of the substructure of a struct. Now it will.

I agree that this is a large cognitive load, but I would like to make dealing with uninitialized code safe. Also dealing with uninitialized code is entirely local, with absolutely no global analysis, so it shouldn't be too bad.

@joshtriplett
Copy link
Member

FWIW, my inclination would be to close, but I can live with "postpone" for now.

@rfcbot reviewed

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Jan 18, 2019
@rfcbot
Copy link
Collaborator

rfcbot commented Jan 18, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label Jan 28, 2019
@rfcbot
Copy link
Collaborator

rfcbot commented Jan 28, 2019

The final comment period, with a disposition to postpone, as per the review above, is now complete.

By the power vested in me by Rust, I hereby postpone this RFC.

@rfcbot rfcbot added postponed RFCs that have been postponed and may be revisited at a later time. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Jan 28, 2019
@rfcbot rfcbot closed this Jan 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-borrowck Borrow checker related proposals & ideas A-expressions Term language related proposals & ideas A-references Proposals related to references A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas A-uninit &uninit related proposals & ideas finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.