Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an unsafe fn for creating a ZST #292

Closed
scottmcm opened this issue Nov 7, 2023 · 17 comments
Closed

Add an unsafe fn for creating a ZST #292

scottmcm opened this issue Nov 7, 2023 · 17 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@scottmcm
Copy link
Member

scottmcm commented Nov 7, 2023

Proposal

Problem statement

People are often unsure about what the rules are for creating a ZST. See, for example, https://internals.rust-lang.org/t/equality-of-zst-values/19815/3?u=scottmcm, where the person made an unsound function. (Thankfully they made the thread to ask about it, rather than just assuming it was fine!)

There's lots of ways that you could make a ZST: mem::uninitialized::<Z>(), mem::zeroed::<Z>(), MaybeUninit::<Z>::uninit().assume_init(), *NonNull::<Z>::dangling().as_ptr(), mem::transmute_unchecked::<_, Z>(()), etc.

It would be nice to have one obvious API to point people at that would also serve as a place to put documentation about the safety and validity invariants of ZSTs, and thus the necessary preconditions for calling the method.

Motivating examples or use cases

unsafe code often has different paths for ZSTs where it wants to make items, such as
https://github.com/rust-lang/rust/blob/9bd71afb90c2a6e0348cdd4a2b10a3bf39908f19/library/alloc/src/vec/into_iter.rs#L196-L197

This can show up in const generic code making arrays, too, where in an if N == 0 { return []; } doesn't work, so a "make a ZST" API could be nice to use there too:
https://github.com/rust-lang/rust/blob/4654a910018cf0447db1edb2a46a0cae5f7dff8e/library/core/src/array/mod.rs#L801-L804

Solution sketch

mod mem {
    pub unsafe fn conjure_zst<T>() -> T;
}

The implementation of which would use (directly or indirectly) things like https://doc.rust-lang.org/nightly/std/intrinsics/fn.assert_inhabited.html to give good error messages for hitting conjure_zst::<!>(). And it would probably have a runtime trap (ud2) for it not actually being a ZST, since that's checkable at monomorphization time, and thus would be cheap in practice.

The docs would have text like https://github.com/rust-lang/rust/pull/95385/files#diff-02049689ae3bb7ac98af3418ad8fdca219bfa1227cb6cad31afc72bdbae30e27R681-R717

Alternatives

  • Enforce ZST-ness with a post-mono error, like a const { assert!(T::IS_ZST) }. I didn't pick that one because that would keep it from being used in if T::IS_ZST { ... } blocks, like the array example.
  • Do nothing, since there's already lots of ways to make ZSTs.
  • Also add reference versions to get &'static mut Z and such.

Links and related work

A few more threads I found:
https://internals.rust-lang.org/t/is-synthesizing-zero-sized-values-safe/11506?u=scottmcm
https://internals.rust-lang.org/t/static-mutable-references-to-zsts/18330?u=scottmcm

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@scottmcm scottmcm added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Nov 7, 2023
@Mokuzzai
Copy link

Mokuzzai commented Nov 8, 2023

Also add reference versions to get &'static mut Z and such.

I don't think this is neccessary as one can just do Box::leak(Box::new(Z)) which is safe and does't allocate.

Edit: I guess you'd need it for no_std

@scottmcm
Copy link
Member Author

scottmcm commented Nov 8, 2023

no_std could hypothetically use Box::leak(Box::new_in(Z, AlwaysFailsAllocator)).

But that's still different because it needs an instance. I suppose one could Box::leak(Box::new(conjure_zst::<Z>())), but avoiding incantations like that are part of what I was hoping to do with this API.

(That said, I'm still considering the reference versions out of scope for this ACP, unless libs-api says otherwise.)

@kpreid
Copy link

kpreid commented Nov 8, 2023

The value of including reference-returning versions would be to enable creating ZST refs while communicating clearly to readers that no allocation is happening, and also discoverability; searching std docs won't show you the Box::leak() trick. Redundant-but-better-named functions are not novel in the standard library: std::iter::once() is almost the same as Option::into_iter(), and drop() doesn't need to exist at all.

(But perhaps most of the value would come from documenting these tricks in conjure_zst()'s function documentation.)

@jhpratt
Copy link
Member

jhpratt commented Nov 8, 2023

I'm skeptical that this is something we truly want. While I understand that the docs would certainly say that it has to uphold all safety invariants of the produced type, I feel that a nontrivial number of people will either overlook that requirement, ignore it, or not bother to check what the safety requirements actually are.

@Mokuzzai
Copy link

Mokuzzai commented Nov 8, 2023

a nontrivial number of people will either overlook that requirement, ignore it, or not bother to check what the safety requirements actually are.

I feel like this can be applied to almost every unsafe fn. I've seen a lot of code using mem::transmute incorrectly for example by transmuting between repr(Rust) types.

Having a well documented function gives us where to point when we spot UB.

@jhpratt
Copy link
Member

jhpratt commented Nov 8, 2023

mem::transmute is also described in its documentation as an "absolute last resort" and is something that is essentially required for a number of situations.

@ryanavella
Copy link

I feel that a nontrivial number of people will either overlook that requirement, ignore it, or not bother to check what the safety requirements actually are.

I suspect that anyone who writes unsafe { conjure_zst() } would also not hesitate to write unsafe { MaybeUninit::<Z>::uninit().assume_init() }. The latter's documentation only mentions being wary of "additional invariants," and some users (e.g. see the linked internals.rust-lang discussion) aren't sure whether ZSTs can have invariants at all since they can't have an invalid bit pattern.

@kpreid
Copy link

kpreid commented Nov 8, 2023

I've seen several cases of people asking about using transmute or other tricks to produce a ZST. The question they manage to think of is “Is this UB?”, which is a good thing to think about, but having to make that journey distracts them from the more important question — “is this correct for the ZST type?” Having an official function for conjuring ZSTs, whose documentation tells them about the actually important question, will be a usability improvement (as well as a readability improvement as discussed in my previous comment).

(On the other hand, I've never seen someone ask about conjuring a ZST where they were intending to do it for a third-party type that it'd be invalid for. So perhaps this isn't important in practice.)

@jhpratt
Copy link
Member

jhpratt commented Nov 8, 2023

I think I can mostly sum up my position in a single question. If a library wanted to permit downstream users to create a ZST out of thin air, why wouldn't they expose that behavior themself? Uses for within the same crate aren't a concern, given that they can just pub(crate) the ZST.

@scottmcm
Copy link
Member Author

scottmcm commented Nov 8, 2023

I feel that a nontrivial number of people will either overlook that requirement, ignore it, or not bother to check what the safety requirements actually are.

What I would hope is that the mere fact that it's unsafe helps people have the "wait, why isn't it just safe to do that?", rather than just reaching for mem::zeroed::<Z>() because they "know" that that's sound because they checked the size_of.

With highly-flexible unsafe methods it's easy to think "well of course it's unsafe because of other cases, but not in this one", whereas with a hyper-specific specific method if it's unsafe it's more obviously because the operation is fundamentally unsafe.

If a library wanted to permit downstream users to create a ZST out of thin air, why wouldn't they expose that behavior themself?

I wholeheartedly agree that Z::default() is better when possible.

However, I point to things like the linked vec::IntoIter example that are dealing in generic types and justifiably know that it's sound to return the Z instances, but also very much want to handle non-Default non-Copy generic types that aren't going to have any safe way to do this.

@dtolnay
Copy link
Member

dtolnay commented Nov 21, 2023

We discussed this ACP in last week's libs-API meeting. Overall, we were positive on an unsafe function in core::mem for obtaining a value of ZST type, and there were no objections to conjure_zst::<Z>() as a name for this.

We spent a while debating whether it would be more valuable as a wrapper around transmute::<(), Z>(()) (compile-time error if Z isn't an (inhabited??) ZST) vs around assert + zeroed::<Z>() (runtime panic).

My initial impression was that conjure_zst has commonality with the array_chunks precedent where the team decided that allowing array_chunks::<0> to compile in a not-taken branch was important (see rust-lang/rust#99471 (comment)). The same thing is also identified as being desirable in this ACP under the Alternatives.

if mem::size_of::<T>() == 0 {
    conjure_zst::<T>()
} else {
    // something else
}

Mara brought up that she has some cases in her codebase where conjure_zst would be nice to have, and all of them involve a specific zero-sized type, not conjuring an arbitrary generic type if it happens to be zero sized. I am familiar with one case like this in my work codebase involving C++ FFI.

#[derive(Copy, Clone)]
#[non_exhaustive]
pub struct DidInit;

// Call this once at program startup
pub fn init() -> DidInit {
    extern "C" {
        init(); // expensive
    }
    unsafe { init(); }
    DidInit
}

// Called often in hot code
pub fn do_interesting_thing(_proof: DidInit) {
    extern "C" {
        fn do_interesting_thing(); // SAFETY: requires init to have been called
    }
    // No runtime enforcement needed!
    unsafe { do_interesting_thing(); }
}

// Most callers will thread a DidInit through from main to all
// the places that need the proof that init has been called. But
// some places (such as, behind certain third-party traits) it's
// just hard to get to and so they want to unsafely assume that
// init has been called. `do_interesting_thing(conjure_zst())`

At this point I rethought and leaned toward having a conjure_zst that replaces transmute(()) would be more valuable than one that replaces zeroed(). The cases that have been doing if size_of::<T>() == 0 could continue to use zeroed::<T>(). That's less bad than making someone decide whether they want to write do_interesting_thing(conjure_zst::<DidInit>()) (to take advantage of the sweet new name, despite losing checking) vs do_interesting_thing(transmute(())) (in spite of the inferior spelling, to more likely detect problem if the Rust binding is ever refactored to a non-ZST design).

A conjure_zst that replaces transmute(()) is the one that benefits more from being in the standard library and leveraging transmute-like dedicated diagnostics as an intrinsic. Meanwhile a conjure_zst that replaces zeroed() (impl) serves just as a minor convenience. (And if a primary motivation for having conjure_zst in the first place is to have a place to document why people shouldn't assume that any ZST is always safe to conjure, both ways are equally suitable for that.)

We didn't end up committed to either approach yet. I think we'll bring it up again in another meeting.

After the meeting I read through all the related links:

and still not sure what to think. But supporting if size_of::<T>() == 0 { conjure_zst::<T>() } seems well-motivated, including in standard library code.

One novel alternative was brought up later on Zulip by @the8472.

// alloc::vec::IntoIter::next, for example

trait IntoIterImpl {
    fn do_next<T>(i: &mut IntoIter<T>) -> Option<T>;
}

struct IsZeroSized<const IS: bool>;

impl IntoIterImpl for IsZeroSized<true> {
    fn do_next<T>(i: &mut IntoIter<T>) -> Option<T> { ... }
}

impl IntoIterImpl for IsZeroSized<false> {
    fn do_next<T>(i: &mut IntoIter<T>) -> Option<T> { ... }
}

IsZeroSized::<{ mem::size_of::<T>() == 0 }>::do_next(self)

If trait impls for Struct<true> + Struct<false> together unify into one impl that's usable as if it were for <const B: bool> Struct<B>, that might be even better suitable for this stuff in the standard library than if size_of::<T>() == 0 { ... } would be. @scottmcm pointed out this would save so much MIR generation in things like the slice iterators.

Maybe this is "future work" more so than "alternative", because in that snippet, IsZeroSized<true>::do_next would most likely still end up containing a conjure_zst<T>() if it existed.

@kpreid
Copy link

kpreid commented Nov 21, 2023

Most callers will thread a DidInit through from main to all the places that need the proof that init has been called. But some places (such as, behind certain third-party traits) it's just hard to get to and so they want to unsafely assume that init has been called. do_interesting_thing(conjure_zst())

This particular case seems to me better served by the library adding an unsafe function DidInit::unchecked_did_init() than by suggesting the caller conjure the ZST. That way, the library can document reasonable usage, and if the invariants change such that particular attention needs to be drawn, choose to remove or rename the function as a breaking change.

if size_of::<T>() == 0 { conjure_zst::<T>() }

Idea: What if conjure_zst() was statically checked, but there was another function for this case?

/// If `T` has zero size, then return an instance of `T`.
///
/// This function always returns a fully initialized value and cannot
/// cause undefined behavior solely by being called, but it is still unsafe
/// because the type may have particular invariants. ...
pub unsafe fn conjure_if_zst<T>() -> Option<T> {
    if size_of::<T>() == 0 {
        Some(unsafe { zeroed::<T>() })
    } else {
        None
    }
}

This would serve the run-time-branching cases directly, giving them a useful documented function to use; conjure_if_zst().unwrap_or_else(more_expensive_construction). On the other hand, it might be less clear to read than the visible if condition.

@LunarLambda
Copy link

Has there been any development towards this since November? The issue/ACP still gets brought up every so often.

@scottmcm
Copy link
Member Author

scottmcm commented Dec 7, 2024

It's now been more than a year since

We didn't end up committed to either approach yet. I think we'll bring it up again in another meeting.

so I'm nominating it :)

@rustbot label: +I-libs-api-nominated


But supporting if size_of::<T>() == 0 { conjure_zst::<T>() } seems well-motivated, including in standard library code.

This is still my biggest thought here. For any case where the specific type is always known to be zero-sized by the compiler (in a path independent manner), other options like writing it transmute(()) or MyType::new_unchecked() exist. (Especially if we were to move transmute's check to being const { assert!(T::SIZE == U::SIZE) }; instead of the current limited compiler magic.

I suppose the other answer, though, would be to say that this should instead be handled with a const if feature with monomorphization and required-const integration that would let the if N == 0 { transmute::<(), [T; N]>() } work even with a post-mono check in transmute.

(Though even then I still kinda want a function on which to put the docs, but I guess that function could still exist just implemented with const { assert!(T::SIZE == 0); } transmute_unchecked(()) instead of the "runtime" check.)

One other nice thing we can do since I originally talked about this: we can use the UB-checks infra to give a nice error in debug while still allowing full optimizability in release, should be so choose.

@rustbot rustbot added the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Dec 7, 2024
@the8472
Copy link
Member

the8472 commented Dec 7, 2024

Relevant issues for using const asserts: rust-lang/rust#122301 and rust-lang/rfcs#3582

If that approach were sanctioned then a generic caller could, with current rust constructs, avoid triggering the assert with non-ZSTs. What it would not provide is a a path to restricting T to T where IsZeroSized<true> in the future because a generic caller must shed that obligation somewhere in the type system when calling a more specific method, which a const-if probably would not do.

Relevant const-generics issue: rust-lang/project-const-generics#26

So I think the ideal solution continues to be blocked on language features like generic const exprs. We could pick a less than ideal solution of course.

@dtolnay
Copy link
Member

dtolnay commented Dec 10, 2024

The libs-API team discussed this ACP in today's meeting. We are accepting the following variation of this proposal: compile-time checking if instantiated with a concrete type, and runtime checking if instantiated with a generic type.

mod concrete {
    pub(super) struct Concrete(());
}

fn demo<T>() {
    let _ = unsafe { mem::conjure_zst::<concrete::Concrete>() }; // okay

    let _ = unsafe { mem::conjure_zst::<std::string::String>() }; // compile error

    if mem::size_of::<T>() == 0 {
        let _ = unsafe { mem::conjure_zst::<T>() }; // okay
    } else {
        let _ = unsafe { mem::conjure_zst::<T>() }; // panic
    }
}

We anticipate that the implementation strategy would involve nominally being always runtime checked, but with something analogous to a deny-by-default lint to recognize the concrete case.

@dtolnay dtolnay added ACP-accepted API Change Proposal is accepted (seconded with no objections) and removed I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. labels Dec 10, 2024
@dtolnay dtolnay closed this as completed Dec 10, 2024
@kennytm
Copy link
Member

kennytm commented Dec 11, 2024

with something analogous to a deny-by-default lint to recognize the concrete case.

This is beyond the scope of libs? It seems the implementation would just assert_eq!(size_of::<T>(), 0); and somehow the compiler can see through the code and issue the unconditional_panic lint as explained in rust-lang/rust#74985 (comment)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

10 participants