-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Direct and Partial Initialization using &uninit T #2534
Conversation
Earlier conversation can be found in the Pre-RFC |
I think it's better to write a separate RFC for EDIT: |
Did you mean |
Can you add some discussion of the advantages of |
This example is broken, see my comment below for the udpated version #[derive(Clone, Copy)]
pub struct Outer {
pub public_field: u32
private_field: u32
}
impl Outer {
pub fn new(private_field: u32) -> Self {
Outer { public_field: 0, private_field }
}
}
// The code below lives in a different crate
// relevant imports here
let outer = Outer::new(20);
foo(&out outer);
let outer = MaybeUninit::new(Outer::new(30));
bar(&mut outer);
fn foo(outer: &out Outer) {
// println!("{}", outer.should_only_write); // Compile time error
outer.should_only_write = 20; // Fine, because we only write
}
fn bar(outer: &mut MaybeUninit<Outer>) {
outer.set(
// what to put in here?
)
}
This example had a few problems, see my below for the updated version. |
|
|
@KrishnaSannasi Reading values from a |
@cramertj How about values on the heap? (Such as in a |
The process works no differently on the heap, but the specific API for placement-new inside of
|
A impl<T> Vec<T> {
fn emplace_back(put: impl FnOnce(&uninit T)) {
// ... allocate magic ...
put(&uninit end);
}
} and we could have some-placement-new like sugar for ease of use, but that's not necessary. |
@KrishnaSannasi Yeah, that works out nicely. A similar thing using pub struct Uninit<'a, T, Tag> {
location: &'a mut MaybeUninit<T>,
marker: PhantomData<Tag>,
}
pub struct UninitFilled<Tag>(PhantomData<Tag>);
impl<'a, T, Tag> Uninit<'a, T, Tag> {
pub fn new(location: &'a mut MaybeUninit<T>) -> Self {
...
}
#[inline(always)]
pub fn fill(self, value: T) -> UninitFilled<UniqToken> {
self.location.set(value);
UninitFilled(PhantomData)
}
}
impl<T> Vec<T> {
pub fn emplace_back(put: impl for<Tag> FnOnce(Uninit<'_, T, Tag>) -> UninitFilled<Tag>) {
...
}
} The |
@cramertj Cool, why do we need Also, how about this: #[derive(Clone, Copy)]
pub struct Outer {
pub public_field: u32
private_field: u32
}
impl Outer {
pub fn new(private_field: u32) -> Self {
Outer {
public_field: 0,
private_field
}
}
}
#[repr(transparent)]
struct MaybeUninitWrite<'a, T: 'a>(&mut MaybeUninit<T>);
impl<'a, T: 'a> MaybeUninitWrite<'a, T> {
pub fn new(inner: &'a mut MaybeUninit) -> Self {
MaybeUninitWrite(inner)
}
pub fn set(&mut self, value: T) {
(self.0).set(value);
}
}
// The code below lives in a different crate
// relevant imports here
let outer = Outer::new(20);
foo(&out outer);
let mut outer = MaybeUninit::new(Outer::new(30));
let outer = MaybeUninitWrite::new(&mut outer);
bar(&mut outer);
let mut outer = MaybeUninit::new(Outer::new(40));
unsafe { baz(&mut outer); }
// These functions should not be able to read any value in Outer. It could cause a segfault.
// (obviously this code won't but we can imagine code very similiar to this
// that might actually segfault, using hw interfaces managed by the OS)
// This enforces that contract statically
fn foo(outer: &out Outer) {
// println!("{}", outer.public_field); // Compile time error
outer.public_field = 20; // Fine, because we only write
}
// This cannot do the same thing as foo
fn bar(outer: &mut MaybeUninitWrite<Outer>) {
// not possible to change public_field on outer
}
// This cannot enforce a write-only contract,
// and might segfualt if outer holds an uninitialized value
unsafe fn baz(outer: &mut MaybeUninit<Outer>) {
outer.get_mut().public_field = 20;
} Doing something this simple, should not require unsafe code, or complex type usage. |
should raw pointers, *out T be added should a new trait, OverwriteSafe, be used for the bounds of &out T
Another idea for the bounds of The downside to this approach is that we add a new trait to the standard library, and adding a |
Why not use a lifetime, instead of a type as tag? |
@bjorn3 It's possible that an invariant lifetime would be sufficient (so that the user couldn't create a |
In the examples with vector |
@YaLTeR You would be forgetting the pointer, not the underlying value. In the closure case, you would not be initializing the value, so it would break this rule and get a compiler error
and you cannot return a
I put this rule in for the exact reason you pointed out. If we create an uninitialized value in a That way, only
How would you define an invariant lifetime, using references will always give a covariant lifetime, and there doesn't seem to be another way to constrain lifetimes, if I'm understanding the nomicon section of variance correctly. |
This assumes the compiler can identify that ...and this is certainly something it can do; any function with the signature |
Would this help to solve the problem where only part of a struct has defaults? e.g. struct Person {
given_name: String, // no default
family_name: String, // no default
other_names: Vec<String>, // default = Vec::new()
email: Option<String>, // default = None
//...
} There could be some function that returns an Would this be possible? |
@ExpHP Yes, that sounds right. @derekdreery That would require global knowledge of the entire program in the worst case, and that is undesirable, so no, this will not solve that problem. This is fine for the compiler to do, but the human programming would find this confusing, and hard to reason about. This also makes changing the implementation of a function a breaking change, which is also undesirable. |
@rfcbot fcp postpone Per my previous comment, I am moving to postpone this RFC. My reasoning remains unchanged:
(Per that same comment, we are still working through the NLL and Polonius transition and will be for some time.) |
Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged team members:
No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
text/0000-write-pointers.md
Outdated
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
- What is the correct bounds for `T` in `&out T`? conservatively, `T: Copy` works and is sound. But as [@TechnoMancer](https://internals.rust-lang.org/u/TechnoMancer) pointed out, `T: !Drop` is not the correct bound. For example, `Wrapper(Vec<String>)`, clearly cannot be overwritten safely, because `Vec<String>`, must drop do deallocate the `Vec`, but `Wrapper` itself does not implement drop. Therefore either a new trait is needed (but unwanted), or we must keep the `Copy` bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing to call destructors is safe; better find another word. This is mostly a technicality, however; resource leaks are still something Rust would rather avoid, and I agree not accounting for this here at all would make them too easy.
But besides this, I've been long thinking that Drop
needs an overhaul. I'd replace it with something which looks like this:
trait Drop { /* ... */ }
: same interface as beforeunsafe trait Destruct: Drop {}
: marker trait, assures that values of a given type can be disposed of by merely callingDrop
implementations of their constituent fields; the type has no disposal logic of its own and can be safely destructured (this corresponds to what is now informally expressed as!Drop
)unsafe auto trait Forget: Destruct {}
: marker trait, assures that allDrop
implementations of transitively constituent fields are no-ops; in other words,mem::drop
on the type is completely equivalent tomem::forget
.
Now, automatic deriving behaviour:
- for any given type
T
, unless the containing module provides an explicitDrop
implementation, the compiler automatically derives a no-opDrop
andDestruct
forT
; - if all constituent fields are also
Forget
, the compiler derivesForget
as well.
This system would remove a major motivation for introducing negative traits/bounds (which have been pointed out to be rather problematic, especially in the context of automatic impls for !
), as it would create a way to formalise 'trivial disposal behaviour' in the type system without them; in this case, the correct bound would be Forget
. It would also open the doors to adding true linear typing to Rust: those could simply explicitly opt-out of implementing Drop
. In other words, lacking an impl Drop
would no longer mean 'trivial disposal logic', but 'no automatic disposal at all'.
However, I'm not sure how backwards compatible this would be. Obviously, introducing true linear typing would incur some major changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Drop overhaul sounds nice on paper, I would still still want some form of negative trait impls, because there are some APIs to need this to be work at all. But that is beside the point. The Drop overhaul doesn't fit in with this RFC, and you could post it to internals if you want more feedback.
I did note that failing to call destructors is safe as the very next sub bullet. This is an unresolved question because I wasn't sure if we wanted to allow leaks this way.
Strong oppose. Cognitive load and risk of writing wrong code is high, motivation is marginal. |
@graydon Could you perhaps elaborate with examples of where it could go wrong and examples of the risks? |
@graydon I'm currently in the middle of stripping down this proposal. No more I can see how it can be a large cognitive load, but dealing with uninitialized memory is hard and so the rules around
struct Out<'a, T> { ptr: &'a mut T }
impl<'a, T> Out<'a, T> {
// to set without dropping old value
fn set(&mut self, value: T) { unsafe { std::ptr::write(self.ptr, value); } }
fn reborrow<'b>(out: &mut Self) where 'a: 'b { Out { ptr: out.ptr } }
}
impl<'a, T> From<&'a mut T> for Out<'a, T> {
fn from(ptr: &'a mut T) -> Self {
Self { ptr }
}
} edit: I know this isn't exactly what I put out for |
Users obtaining pointers to uninitialized memory casually because "it's faster", which they then convert to a raw pointer. Presently casually taking a ref to an uninit variable gets you an error; you have to work somewhat to get yourself some uninit memory. We worked hard to paint uninitialized memory into a very small corner in this language you have to do some ceremony to get at. Now it'll be a chapter heading in a book that's very easy to see, learn, get interested in, tinker with, use in your own libraries. Additional unclear risk: drop semantics. Drops execute top-down, so this has to not work on drop types. The proposal isn't entirely clear if it's going to try to allow that. Cognitive load I think is obvious: it's a whole new qualifier with a complex set of rules that people will naturally sprinkle all over codebases, especially if it's associated with speed. We worked super hard to minimize the number of qualifiers. It looks like it can all be done with a library type, so it should be. It's also a new state of initialized-ness for users to think about, for every memory cell. Currently the user's mental model does not include (unless they're off in unsafe land) the initialized-ness of the substructure of a struct. Now it will. |
PITs are supposed to put stricter requirements on this thing. a pointer to a PIT is valid, but is a pointer to a PIT, not to a full object, because PITs encode initialization state in the type. you can still access uninitialized or invalid memory by going through usize but we have that issue with many existing types anyway. |
I am not convinced by this argument, to do anything with with raw pointers you need let x: String = unsafe { std::mem::unintialized() };
let y: *mut _ = &mut x; // raw pointer to uninit memory But now that you have a raw pointer to uninit memory you need to do pretty much all the same work that I have laid out to make sure that you don't do anything unsound with it. This is cognitive load that could be shifted onto the compiler to have it validate our program. Which is what this proposal is trying to do
Currently this will be allowed, because it logically identical to storing the data in a bunch of temporaries, then constructing the
I agree that this is a large cognitive load, but I would like to make dealing with uninitialized code safe. Also dealing with uninitialized code is entirely local, with absolutely no global analysis, so it shouldn't be too bad. |
FWIW, my inclination would be to close, but I can live with "postpone" for now. @rfcbot reviewed |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to postpone, as per the review above, is now complete. By the power vested in me by Rust, I hereby postpone this RFC. |
Use
&uninit T
to initialize your memory directly!This can have performance improvements and makes dealing with uninitialized memory completely safe.
Rendered
this RFC has been edited This RFC used to also have
&out
, but I removed them due to insufficient motivation, and a relatively easy workaround to&out
.