diff --git a/text/0000-write-pointers.md b/text/0000-write-pointers.md new file mode 100644 index 00000000000..36382bb7a66 --- /dev/null +++ b/text/0000-write-pointers.md @@ -0,0 +1,411 @@ +- Feature Name: parital_initialization + +- Start Date: 2018-08-29 + +- RFC PR: (leave this empty) + +- Rust Issue: (leave this empty) + +# Summary + +[summary]: #summary + +This RFC aims to allow direct ergonomic initialization for optimization, and partial struct and enum initialization. It will do so through the usage of a one new reference type, `&uninit T` (the name is not important to me). + +# Motivation + +[motivation]: #motivation + +The builder pattern was created as a way to try and solve the issue of not having partial initialization, but it has problems with large structs, and that the `*Builder` struct must necessarily be larger than the target struct, null-ptr optimizations not-withstanding. Also, it is very expensive to move large structs, and relying on the optimizer to optimize out moves isn't very robust, `&uninit T` could serve as a way to directly place things into the desired memory location and partially initialize that memory. + +# Guide-level explanation + +[guide-level-explanation]: #guide-level-explanation + +`&uninit T` is a write-once reference to `T`, after the first write, it is allowed to read the values (behaves exactly like `&mut T`). Does not run destructors on the first write. \ + +For all examples, I will use these two structs + +```Rust +struct Foo { a: u32, b: String, c: Bar } +#[derive(Clone, Copy)] +struct Bar { d: u32, e: f32 } + +impl std::ops::Drop for Foo { + fn drop(&mut T) { + println!("Dropping Foo {}", foo.b); + } +} +``` + +## `&uninit T` + +Using `&uninit T`, we can do partial initialization and directly initialize. +```Rust +let x: Foo; + +*(&uninit x.a) = 12; +*(&uninit x.b) = "Hello World".to_string(); +*(&uninit x.c.d) = 11; +*(&uninit x.c.e) = 10.0; +``` +This works because when we take an `&uninit` to `x.a`, we are implicity also taking an `&uninit` to `x`, and the dot operator will not attempt to read the memory location anywhere in `x`. + +For ease of use, you can simply write +```Rust +let x: Foo; + +x.a = 12; +x.b = "Hello World".to_string(); +x.c.d = 11; +x.c.e = 10.0; +``` +and the compiler will infer that all of these need to use `&uninit`, because `x` was not initialized directly. + +### Restrictions + +#### Storing + +You cannot store `&uninit T` in any way, not in structs, enums, unions, or behind any references. So all of these are invalid. + +```Rust +fn maybe_init(maybe: Option<&uninit T>) { ... } +fn init(ref_to_write: &mut &uninit T) { ... } +struct Temp { a: &uninit Foo } +``` + +#### Conditional Initialization + +One restriction to `&uninit T` is that we cannot conditionally initialize a value. For example, none of these are allowed. +```Rust +let x: Foo; +let condition = ...; + +if condition { + x.a = 12; // Error: Conditional partial initialization is not allowed +} +``` +```Rust +let mut x: Foo; +let condition = ...; + +while condition { + x.a = 12; // Error: Conditional partial initialization is not allowed +} +``` +```Rust +let mut x: Foo; + +for ... { + x.a = 12; // Error: Conditional partial initialization is not allowed +} +``` +Because if we do, then we can't gaurentee that the value is in fact initialized. + +Note, that this is not conditionally initializing `x.e`, because by the end of the `if-else` block, `x.e` is guaranteed to be initialized. +```Rust +let x: Bar; + +x.d = 10; + +if { ... any condition ... } { + x.e = 1.0; +} else { + x.e = 0.0; +} +``` + +### Using partially initialized variables + +```Rust +let x: Bar; +x.d = 2; + +// This is fine, we know that x.d is initialized +x.d.pow(4); +if x.d == 16 { + x.e = 10.0; +} else { + x.e = 0.0; +} +// This is fine, we know that x is initialized +assert_eq!(x.e, 10.0); +``` + +### Functions and closures + +You can accept `&uninit T` as arguments to a function or closure. + +```Rust +fn init_foo(foo: &uninit Foo) { ... } +let init_bar = |bar: &uninit Bar| { ... } +``` + +But if you do accept a `&uninit T` argument, you must write to it before returning from the function or closure. + +```Rust +fn valid_init_bar_v1(bar: &uninit Bar) { + bar.d = 10; + bar.e = 2.7182818; +} +fn valid_init_bar_v2(bar: &uninit Bar) { + // you must dereference if you write directly to a &uninit T + // This still does not drop the old value of bar + *bar = Bar { d: 10, e: 2.7182818 }; +} +fn invalid_init_bar_v1(bar: &uninit Bar) { + bar.d = 10; + // Error, bar is not completely initialized (Bar.e is not initialized) +} + +fn invalid_init_bar_v2(bar: &uninit Bar) { + bar.d = 10; + if bar.d == 9 { + return; // Error, bar is not completely initialized (Bar.e is not initialized) + } + bar.e = 10.0; +} +``` + +If a closure captures a `&uninit T`, then it becomes a `FnOnce`, because of the write semantics change after the first write. + +```Rust +let x: Foo; + +let init = || x.a = 12; // init: FnOnce() -> () +``` + +**Note on Panicky Functions:** +If a function panics, then all fields initialized in that function will be dropped. No cross-function analysis will be done. + +## Constructors and Direct Initialization + +Using `&uninit` we can create constructors for Rust! +```Rust +struct Rgb(u8, u8 ,u8); + +impl Rgb { + fn init(&uninit self, r: u8, g: u8, b: u8) { + self.0 = r; + self.1 = g; + self.2 = b; + } +} + +let color: Rgb; +color.init(20, 23, 255); +``` + +and we can do direct initialization (also called Placement New). +```Rust +impl Vec { + pub fn emplace_back(&mut self, init: impl FnOnce(&uninit T)) { + /// This code is taken from the Vec push implementation is the std lib + /// and adapted to use &uninit to show how it will be used for placement new + if self.len == self.buf.cap() { + self.reserve(1); + } + unsafe { + let end: &uninit T = &uninit *self.as_mut_ptr().add(self.len); + init(end); // this line has been changed for the purposes of placement new + self.len += 1; + } + } +} +``` + +# Reference-level explanation + +[reference-level-explanation]: #reference-level-explanation + +**NOTE** This RFC does NOT aim to create new raw pointer types, so no `*uninit T`. There is no point in creating these. + +## Rules of `&uninit T` + +`&uninit T` should follow some rules in so that is is easy to reason about `&uninit T` locally and maintain soundness +- `&uninit T` should be an exclusive pointer, similar to how `&mut T` is an exclusive pointer +- `&uninit T` can only be assigned to once +- Writing does not drop old value. + - Otherwise, it would not handle writing to uninitialized memory + - More importantly, dropping requires at least one read, which is not possible with a write-only reference +- You cannot reference partially initialized memory +```Rust +let x: Bar; + +fn init_bar(bar: &uninit Bar) { ... } +fn init_u32(x: &uninit u32) { ... } + +x.e = 10.0; + +// init_bar(&uninit x); // compile time error: attempting to reference partially initialized memory +init_u32(&uninit x.d); // fine, x.d is completely uninitialized. +``` +- Functions and closures that take a `&uninit T` argument must initialize it before returning + - because of this rule, `T` cannot resolve to `&uninit _` in any case, because it is impossible to parametrically initialize the `&uninit` +- You can take a `&uninit T` on any `T` that represents uninitialized memory, for example: only the first is ok. +```Rust +let x: Foo; +let y = &uninit x; +``` +```Rust +let x: Foo = Foo { a: 12, b: "Hello World".to_string() }; +init(a: &uninit Foo) { ... } +init(&uninit x); // this function will overwrite, but not drop to the old value of x, so this is a compile-time error +``` + +## Coercion Rules + + - `&T` // no change + - `*const T` + - `&mut T` + - `*mut T` + - `&T` + - `&uninit T` - `*mut T`, and `&T` or `&mut T` once initialized depending on if the variable binding is mutable or not. + + ```Rust + struct Foo(i32, i32); + let foo: Foo; + foo.0 = 10; + foo.1 = 20; + + // foo.0 = 10; // error foo is immutable + + let mut foo_mut: Foo; + foo_mut.0 = 10; + foo_mut.1 = 20; + + foo_mut.0 = 10; // fine, foo_mut is mutable + ``` + +## `self` + +We will add `&uninit self` as sugar for `self: &uninit Self`. This is for consistency with `&self`, and `&mut self` + +## Panicky functions in detail + +Because we can pass `&uninit T` to functions, we must consider what happens if a function panics. For example: +```Rust +fn init_foo_can_panic(foo: &uninit Foo) { + foo.b = "Hello World".to_string(); + foo.a = 12; + + if foo.a == 12 { + // When we panic here, we should drop all values that are initialized in the function. + // Nothing could have been initialized before the function because we have a &uninit T + panic!("Oh no, something went wrong!"); + } + + foo.c = Bar { d = 10, e = 12.0 }; +} + +let x: Foo; + +init_foo_can_panic(&uninit x); +``` + +# Drawbacks + +[drawbacks]: #drawbacks + + - This is a significant change to the language and introduces a lot of complexity. \ + - Partial initialization can be solved entirely through the type-system as shown [here](https://scottjmaddox.github.io/Safe-partial-initialization-in-Rust/). But this does have its problems, such as requiring an unstable feature (untagged_unions) or increased size of the uninitialized value (using enums). + +# Rationale and alternatives + +[rationale-and-alternatives]: #rationale-and-alternatives + +## Allow Drop types to be partially initialized + +Then they would only be dropped if all of their fields are initialized + +## Placement-new + +Placement new would help, with initializing large structs. + +## As sugar + +This could be implemented as sugar, where all fields of structs that are partially initialized are turned into temp-variables that are then passed through the normal pipeline. + +For example +```Rust +let x: Bar; +x.d = 10; +x.e = 12.0; +``` + +would desugar to + +```Rust +let x: Bar; +let xd = 10; +let xe = 12.0; +x = Bar { d: xd, e: xe }; +``` + +But this would not be able to replace placement new as it can't handle `&uninit T` through function boundaries. Also this would not solve the problem of direct-initialization. + +## `MaybeInit` + +`MaybeInit` is almost like `&uninit`, but it requires the use of unsafe to get a value out of it, and it does not work on a value in place, i.e. the value must be moved out of the `MaybeInit` after initialization. + +# Prior art +[prior-art]: #prior-art + +~~ + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + + - We can use drop flags to check when a variable has been initialized, and use that information to allow conditional initialization. But should we do this? +--- + +edit: +Added Panicky Function sub-section due to [@rkruppe](https://internals.rust-lang.org/u/rkruppe)'s insights + +added `&out T` by C# to prior arts and alternative syntax due to [@earthengine](https://internals.rust-lang.org/u/earthengine)'s suggestion + +removed lots of unnecessary spaces and newlines + +edit 2: + +Incorporating [@gbutler](https://internals.rust-lang.org/u/gbutler)'s proposal of splitting `&uninit T` into `&out T` and `&uninit T` + +edit 3: + +Used [@gbutler](https://internals.rust-lang.org/u/gbutler)'s example of FrameBuffer that interfaces hardware for `&out T` + +edit 4: + +Fixed example for `&out T`. + +edit 5: + +Added casting rules from `&out T` and `&uninit T` to raw pointers. + +edit 6: + +added the constraint that no type parameter `T` can resolve to `&uninit _`, due to [@ExpHP](https://github.com/ExpHP)'s insights + +edit 7: + +Fixed problems listed by [@matthewjasper](https://github.com/matthewjasper) in the [review](https://github.com/rust-lang/rfcs/pull/2534/files/d7b9e1397f2b4851828c0937c4e4ad07ffe4d693) + +Removed comment about `!Drop` bound for `&out T`, because it is incorrect. + +Removed the Casting Rules Section, because it was wrong, instead to convert raw pointers to references, use a reborrow. for example if `x: *mut T`, then `unsafe { &out *x }` will turn it into an `&out T`. + +Added an unresolved question, about use of drop flags to allow for conditional initialization. + +edit 8: + +Updated emplace_back implementation with one similar to `Vec::push` (and taken and adapted from`Vec::push`). + +Made one `&out` example better by adding where a compile time error would be if `&out` is used incorretly. + +edit 9: +Removed all bits about `&out T` because `&out T` can be implemented as a library item see [my comment here](https://github.com/rust-lang/rfcs/pull/2534#issuecomment-453720019) for details about that. + +--- +I would like to thank all the people who helped refine this proposal to its current state: [@rkruppe](https://internals.rust-lang.org/u/rkruppe), [@earthengine](https://internals.rust-lang.org/u/earthengine), [@gbutler](https://internals.rust-lang.org/u/gbutler), +and [@TechnoMancer](https://internals.rust-lang.org/u/TechnoMancer) in the Pre-RFC, and +[@cramertj](https://github.com/cramertj), [@ExpHP](https://github.com/ExpHP), and [@matthewjasper](https://github.com/matthewjasper) in the RFC thank you!