Skip to content

Latest commit

 

History

History
632 lines (428 loc) · 39.8 KB

3519-arbitrary-self-types-v2.md

File metadata and controls

632 lines (428 loc) · 39.8 KB

Summary

Allow types that implement the new trait Receiver<Target=Self> to be the receiver of a method.

Motivation

Today, methods can only be received by value, by reference, or by one of a few blessed smart pointer types from core, alloc and std (Arc<Self>, Box<Self>, Pin<P> and Rc<Self>).

It's been assumed that this will eventually be generalized to support any smart pointer, such as an CustomPtr<Self>. Since late 2017, it has been available on nightly under the arbitrary_self_types feature for types that implement Deref<Target=T> and for raw pointers.

This RFC proposes some changes to the existing nightly feature based on the experience gained, with a view towards stabilizing the feature in the relatively near future.

Motivation for the arbitrary self types feature overall

The Rust async work identified a need to allow self types of Pin<&mut Self> (and similar). At that time, certain types - Pin, Rc, Box etc. - became hard coded in stable Rust as valid self types. That's been sufficient for many use-cases including async Rust, but this special power is currently restricted to these hard-coded types.

Since then, other use-cases have become clear where crates need to make their own smart pointer types with similar powers.

One use-case is cross-language interop (JavaScript, Python, C++). In many cases, automatic code generation tools need to represent foreign language pointers or references somehow in Rust, and often, we want to call methods on such types. But, other languages' references can’t guarantee the aliasing and exclusivity semantics required of a Rust reference. For example, the C++ this pointer can't be practically or safely represented as a Rust reference because C++ may retain other pointers to the data and it might mutate at any time.

What is a code generator to do? Its options in current stable Rust are poor:

  • It can represent foreign pointers/references as &T, with a virtual certainty of undefined behavior due to different guarantees in different languages
  • It can represent foreign pointers/references as *const T or *mut T but can't attach methods.
  • It can represent foreign pointers/references as a smart pointer type (CppRef<T> or CppPtr<T>) but can't attach methods.

With "arbitrary self types", smart pointer types can be created which obey foreign-language semantics and yet allow method calls:

#[repr(transparent)]
#[derive(Clone)]
/// A C++ reference. Obeys C++ reference semantics, not Rust reference semantics.
/// There is no exclusivity; the underlying data may mutate, etc.
/// (This is an abridged example: a real CppRef type would fully document invariants
/// here.)
pub struct CppRef<T: ?Sized> {
    ptr: *const T,
}

impl<T: ?Sized> Receiver for CppRef<T> {
    type Target = T;
}

// generated by bindings generator
struct ConcreteCppType {
    // ...
}

// all generated by bindings generator; mostly calls into C++
// In this example these are not marked "unsafe" because we do not directly use
// CppRef::ptr in Rust. This example assumes that the corresponding C++ functions
// do not themselves have unsafe behavior and thus can be presented to Rust as safe.
// Safety of FFI is orthogonal to this RFC.
impl ConcreteCppType {
    fn some_cpp_method(self: CppRef<Self>) {}
    fn get_int_field(self: &CppRef<Self>) -> u32 {}
    fn get_more_complex_field(self: &CppRef<Self>) -> CppRef<FieldType> {}
    fn equals(self: &CppRef<Self>) -> bool {}
}

// generated by bindings generator
fn get_cpp_reference() -> CppRef<ConcreteCppType> {
    // also calls into C++
}

fn main() {
    // Rust code manipulating C++ objects via C++-semantics references
    let cpp_obj_reference: CppRef<ConcreteCppType> = get_cpp_reference();
    // cpp_obj_reference does not obey Rust reference semantics. Other
    // "references" to the same data may exist in the Rust or C++ domain.
    // But it can effectively be used as an opaque token to pass safely
    // through Rust back into C++
    let some_value: u32 = cpp_obj_reference.get_int_field();
    let some_field = cpp_obj_reference.get_more_complex_field();
    cpp_obj_reference.equals(&get_cpp_reference());
}

(fuller example here, with various trait-based attempts to work around the lack of arbitrary self types.)

Another case is when the existence of a reference is, itself, semantically important — for example, reference counting, or if relayout of a UI should occur each time a mutable reference ceases to exist. In these cases it's not OK to allow a regular Rust reference to exist, and yet sometimes we still want to be able to call methods on a reference-like thing.

A third motivation is that taking smart pointer types as self parameters can enable functions to act on the smart pointer type, not just the underlying data. For example, taking &Arc<T> allows the functions to both clone the smart pointer (noting that the underlying T might not implement Clone) in addition to access the data inside the type, which is useful for some methods; this also makes it ergonomic in more cases to make Arc<SomeType> explicit rather than having SomeType contain an Arc internally and have Arc-like clone semantics. Also, being able to change a method from accepting &self to self: &Arc<Self> can be done in a mostly frictionless way, whereas changing from &self to a static method accepting &Arc<Self> will always require some amount of refactoring. These options are currently open only to Rust's built-in smart pointer types, not to custom smart pointer types.

Finally, there's just a matter of symmetry with Rust's own smart pointer types. The Rust for Linux project, for instance, requires a custom Arc type. In theory, users can define their own smart pointers. In practice, they're second-class citizens compared to the smart pointers in Rust's standard library. A type T can accept method calls using smart pointers as the self type only if they're one of Rust's built-in smart pointers.

This RFC proposes to loosen this restriction to allow custom smart pointer types to be accepted as a self type just like for the standard library types.

See also this blog post, especially for a list of more specific use-cases.

Motivation for the v2 changes

Unstable Rust contains an implementation of arbitrary self types based around the Deref<Target=T> trait. Naturally, that trait also provides a means to create a &T. Example:

#[feature(arbitrary_self_types)]

struct SmartPtr<T>(*const T);

impl<T: ?Sized> Deref for SmartPtr<T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        // never called, but smart pointers need to implement this method
        // sometimes it's just not safe to create a reference to self.0
    }
}

struct ConcreteType;

impl ConcreteType {
    fn some_method(self: SmartPtr<ConcreteType>) {

    }
}

fn main() {
    let concrete: SmartPtr<ConcreteType> = ...;
    concrete.some_method();
}

This works well for some smart pointer types where it's OK to create &T (but not necessarily &mut T). This includes Pin and the reference counted pointers. For that reason, the original arbitrary self types feature could be based around Deref. But in other smart pointer use-cases (especially those relating to foreign language semantics) it's not OK to create even &T.

The arbitrary self types feature should be enhanced so it works even when we can't allow &T. As noted above, that's most commonly because of semantic differences to pointers in other languages, but it might be because references have special meaning or behavior in some pure Rust domain. Either way, it may not be OK to create a Rust reference &T, yet we may want to allow methods to be called on some reference-like thing.

For this reason, implementing Deref::deref is problematic for many of the likely users of this "arbitrary self types" feature.

If you're implementing a smart pointer P<T>, and you need to allow impl T { fn method(self: P<T>) { ... }}, yet you can't allow a reference &T to exist, any option for implementing Deref::deref has drawbacks:

  • Specify Deref::Target=T and panic in Deref::deref. Not good.
  • Specify Deref::Target=*const T. This is only possible if your smart pointer type contains a *const T which you can reference - this isn't the case for (for instance) weak pointers or types containing NonNull.

Therefore, the current Arbitrary Self Types v2 provides a separate Receiver trait, so that there's no need to provide an awkward Deref::deref implementation.

This v2 version has two other differences relative to the existing unstable arbitrary_self_type feature:

Aside from these differences, Arbitrary Self Types v2 is similar to the existing unstable arbitrary_self_types feature.

Guide-level explanation

When declaring a method, users can also declare the type of the self receiver to be any type T where T: Receiver<Target = Self>, in addition to using Self by value or reference.

The Receiver trait is simple and only requires specifying the Target type:

trait Receiver {
    type Target: ?Sized;
}

The Receiver trait is already implemented for many standard library types:

  • smart pointers in the standard library: Rc<Self>, Arc<Self>, Box<Self>, and Pin<SomeSmartPtr<Self>> (and in fact, any type which implements Deref)
  • references: &Self and &mut Self

Shorthand exists for references, so that self with no ascription is of type Self, &self is of type &Self and &mut self is of type &mut Self.

All of the following self types are valid:

impl Foo {
    fn by_value(self /* self: Self */);
    fn by_ref(&self /* self: &Self */);
    fn by_ref_mut(&mut self /* self: &mut Self */);
    fn by_box(self: Box<Self>);
    fn by_rc(self: Rc<Self>);
    fn by_custom_ptr(self: CustomPtr<Self>);
}

struct CustomPtr<T>(*const T);

impl<T> Receiver for CustomPtr<T> {
    type Target = T;
}

Recursive arbitrary receivers

Receivers are recursive and therefore allowed to be nested. If type T implements Receiver<Target=U>, and type U implements Receiver<Target=Self>, T is a valid receiver (and so on outward). This is the behavior for the current special-cased self types (Pin, Box etc.), so as we remove the special-casing, we need to retain this property.

For example, this self type is valid:

impl MyType {
     fn by_rc_to_box(self: Rc<Box<Self>>) { ... }
}

The Rust language doesn't provide a way for user code to use this recursive property in generics or iteration, so this trait is unlikely to be useful except to the compiler. Nevertheless, we don't intend to prevent use of the Receiver trait by user code: since the same recursive property applies to Deref yet it's been occasionally useful to introduce Deref bounds.

Implementing methods on smart pointers

If your smart pointer type implements Receiver, you should not add methods to that smart pointer type after its initial creation. As soon as anyone is using your smart pointer type outside of your crate, they may add methods on a contained type; for example:

impl SomeType {
    fn do_something(self: your_crate::SmartPointer<SomeType>) {}
}

If you then add SmartPointer::do_something, this is a conflict, and the compiler will produce an error. It's therefore considered to be a compatibility break to add additional methods to your_crate::SmartPointer. It's OK to add methods at the outset when you create SmartPointer, until the point at which other people start using it.

This principle has been followed for the types in Rust's standard library which implement Receiver; for instance, Box and Rc. Mostly they offer associated functions rather than methods.

In the future there might be a deshadowing algorithm that can relax this rule - see the method shadowing section below for discussion.

Reference-level explanation

core libs changes

The Receiver trait is made public (removing its #[doc(hidden)]) attribute), exposing it under core::ops. It gains a Target associated type.

This trait marks types that can be used as receivers other than the Self type of an impl or trait definition.

pub trait Receiver {
    type Target: ?Sized;
}

A blanket implementation is provided for any type that implements Deref:

impl<P: ?Sized> Receiver for P
where
    P: Deref,
{
    type Target = <P as Deref>::Target;
}

(See alternatives for discussion of the tradeoffs here.)

It is also implemented for &T and &mut T.

Compiler changes: method probing

The existing Rust reference section for method calls describes the algorithm for assembling method call candidates, and there's more detail in the rustc dev guide.

The key part of the first page is this:

The first step is to build a list of candidate receiver types. Obtain these by repeatedly dereferencing the receiver expression's type, adding each type encountered to the list, then finally attempting an unsized coercion at the end, and adding the result type if that is successful. Then, for each candidate T, add &T and &mut T to the list immediately after T.

Then, for each candidate type T, search for a visible method with a receiver of that type in the following places:

  • T's inherent methods (methods implemented directly on T). Any of the methods provided by a visible trait implemented by T.

We'll call this second list the candidate methods.

With this RFC, the candidate receiver types are assembled the same way - nothing changes. But, the candidate methods are assembled in a different way. Specifically, instead of iterating the candidate receiver types, we assemble a new list of types by following the chain of Receiver implementations. As Receiver is implemented for all types that implement Deref, this may be the same list or a longer list. Aside from following a different trait, the list is assembled the same way, including the insertion of equivalent reference types.

We then search each type for inherent methods or trait methods in the existing fashion - the only change is that we search a potentially longer list of types.

It's particularly important to emphasize also that the list of candidate receiver types does not change. But, a wider set of locations is searched for methods with those receiver types.

For instance, suppose SmartPtr<T> implements Receiver but not Deref. Imagine you have let t: SmartPtr<SomeStruct> = /* obtain */; t.some_method();. We will now search impl SomeStruct {} blocks for an implementation of fn some_method(self: SmartPtr<SomeStruct>), fn some_method(self: &SmartPtr<SomeStruct>), etc. The possible self types in the method call expression are unchanged - they're still obtained by searching the Deref chain for t - but we'll look in more places for methods with those valid self types.

Compiler changes: deshadowing

The major functional change to the compiler is described above, but a couple of extra adjustments are necessary to avoid future compatibility breaks by method shadowing.

Specifically, that page also states:

If this results in multiple possible candidates, then it is an error, and the receiver must be converted to an appropriate receiver type to make the method call.

With arbitrary self types v2, the compiler will actively search for additional conflicts in order to produce this error in more cases. Specifically, it will consider whether autoreffed candidates conflict with by-value candidates, in order to produce an error in situations like this:

struct Foo;
struct SmartPtr<T>(T): // implements Receiver

impl<T> SmartPtr<T> {
    fn a(&self) {}   // by reference
}

impl Foo {
    fn a(self: SmartPtr<Self>) {}  // by value
}

fn main() {
    let a = SmartPtr(Foo);
    a.a(); // produces an error
}

To be precise, the compiler will:

  • Search for the best by-value pick
  • Search for the best autoreffed pick
  • Search for the best autorefmut pick
  • For each pair from the above list, consider the first to be the 'shadowing' pick and the second to be the 'shadowed' pick. Show an error if:
    • The same number of autoderefs has been applied (confirming the self type is identical, aside from any autoreffing)
    • One is further along the chain of Receiver than another (confirms that it's arbitrary self types causing the conflcit)
    • The shadowing pick is an inherent impl (we are concerned about the case that a smart pointer is adding inherent methods shadowing inner types, not cases where traits bring further methods into play)
    • The picks don't refer to the same resulting item (which could happen with things like blanket impls for any type)
  • Otherwise, choose the pick in order of by-value, autoreffered, autorefmut, or const ptr as it does now.

Aside from production of errors in more cases, there is no change to method picking here. That said, the production of errors requires us to interrogate more candidates to look for potential conflicts, so this could have a compile-time performance penalty which we should measure.

(The current reference doesn't describe it, but the current algorithm also searches for method receivers of type *const Self and handles them explicitly in case the receiver type was *mut Self. We do not check for cases where a new self: *mut Self method on an outer type might shadow an existing self: *const SomePtr<Self> method on an inner type. Although this is a theoretical risk, such compatibility breaks should be easy to avoid because self: *mut Self are rare. It's not readily possible to produce errors in these cases, because we already intentionally shadow *const::cast with *mut::cast.)

Object safety

Receivers are object safe if they implement the (unstable) core::ops::DispatchFromDyn trait.

As not all receivers might want to permit object safety or are unable to support it, object safety should remain being encoded in a different trait than the here proposed Receiver trait, likely DispatchFromDyn.

This RFC does not propose any changes to DispatchFromDyn. Since DispatchFromDyn is unstable at the moment, object-safe receivers might be delayed until DispatchFromDyn is stabilized. Receiver is not blocked on further DispatchFromDyn work, since non-object-safe receivers already cover a big chunk of the use-cases.

It's been proposed that, instead of DispatchFromDyn, a #[derive(SmartPointer)] mechanism may be stabilized instead. Again, this doesn't block our work on Receiver. There are some use cases for Receiver that won't suit either DispatchFromDyn nor #[derive(SmartPointer)], most notably the Rust for Linux Wrapper type described here.

Lifetime elision

Arbitrary self parameters may involve lifetimes.

Even in existing stable Rust, there are bugs in lifetime elision for complex Self types such as &Box<Self>. We're aiming to fix them whether or not this RFC is accepted. The net rules will be:

  • If a parameter is the first parameter, and
  • Called self, and
  • Its type involves Self anywhere, and
  • Its type contains exactly one lifetime anywhere

then that lifetime may be used to elide lifetimes on return types, and will take precedence over any lifetimes in other parameters.

If this seems wrong, please discuss this over on the linked bug rather than here in this RFC, because none of that should change with this RFC (though it does make it more likely users will run into the current inconsistencies). We'll try to keep this RFC up to date with the outcome of those discussions.

Diagnostics

The existing branches in the compiler for "arbitrary self types" already emit excellent diagnostics. We will largely re-use them, with the following improvements:

  • In the case where a self type is invalid because it doesn't implement Receiver, the existing excellent error message will be updated.
  • An easy mistake is to implement Receiver for P<T>, forgetting to specify T: ?Sized. P<Self> then only works as a self parameter in traits where Self: Sized, an unusual stipulation. It's not obvious that Sizedness is the problem here, so we will identify this case specifically and produce an error giving that hint.
  • There are certain types which feel like they "should" implement Receiver but do not: Weak and NotNull. If these are encountered as a self type, we should produce a specific diagnostic explaining that they do not implement Receiver and suggesting that they could be wrapped in a newtype wrapper if method calls are important. We hope this can be achieved with diagnostic items.
  • The current unstable arbitrary self types feature allows generic receivers. For instance,
    impl Foo {
      fn a<R: Deref<Target=Self>>(self: R) { }
    }
    We don't know a use-case for this. There are several cases where this can result in misleading diagnostics. (For instance, if such a method is called with an incorrect type (for example smart_ptr.a::<&Foo>() instead of smart_ptr.a::<Foo>()). We could attempt to find and fix all those cases. However, we feel that generic receiver types might risk subtle interactions with method resolutions and other parts of the language. We think it is a safer choice to generate an error on any declaration of a generic self type.
  • As noted in the section about compiler changes for deshadowing we will produce a "multiple method candidates" error if a method in an inner type is chosen in preference to a method in an outer type ("inner" = further along the Receiver chain) and the inner type is either self: &T or self: &mut T and we're choosing it in preference to self: T or self: &T in the outer type.

Drawbacks

Why should we not do this?

  • Deref coercions can already be confusing and unexpected. Adding a new Receiver trait could cause similar confusion.
  • Custom smart pointers are a niche use case (but they're very important for cross-language interoperability.)

Method shadowing

For a smart pointer P<T> that implements Deref<Target = T>, a method call p.m() might call a method P::m on the smart pointer type itself, or it might call T::m. If both methods are declared, this results in an error.

Rust standard library smart pointers are designed with this shadowing behavior in mind:

  • Box, Pin, Rc and Arc heavily use associated functions rather than methods.
  • Where they use methods, it's often with the intention of shadowing a method in the inner type (e.g. Arc::clone).

Furthermore, the Deref trait itself documents this possible compatibility hazard, and the Rust API Guidelines has a guideline about avoiding inherent methods on smart pointers.

This RFC does not make things worse for types that implement Deref.

However, this RFC allow types to implement Receiver. This would run the risk of breakage:

struct Concrete;

impl Concrete {
    fn wardrobe(self: SmartPointerWhichImplementsReceiver<Self>) { }
}

fn main() {
    let concrete: SmartPointerWhichImplementsReceiver<Concrete> = /* obtain */;
    concrete.wardrobe()
}

If SmartPointerWhichImplementsReceiver now adds SmartPointerWhichImplementsReceiver::wardrobe(self), the above valid code would start to error.

The same would apply in this slightly different circumstance:

struct Concrete;

impl Concrete {
    fn wardrobe(self: &SmartPointerWhichImplementsReceiver<Self>) { } // this is now a reference
}

fn main() {
    let concrete: SmartPointerWhichImplementsReceiver<Concrete> = /* obtain */;
    concrete.wardrobe()
}

If Rust added SmartPointerWhichImplementsReceiver::wardrobe(&self) we would start to produce an error here. If SmartPointerWhichImplementsReceiver added SmartPointerWhichImplementsReceiver::wardrobe(self) then it would be even worse - code would start to call SmartPointerWhichImplementsReceiver::wardrobe where it had previously called SmartPointerWhichImplementsReceiver::wardrobe.

The deshadowing section of the compiler changes, describes how we avoid this. The compiler will take pains to identify any such ambiguities and it will show an error.

We have (extensively) considered algorithms to pick the intended method instead - see picking the shadowed method, below.

Rationale and alternatives

As this feature has been cooking since 2017, many alternative implementations have been discussed.

Deref-based

As noted in the rationale section, the currently nightly implementation implements arbitrary self types using the Deref trait.

No blanket implementation for Deref

Another major approach previously discussed is to have a Receiver trait, as proposed in this RFC, but without a blanket implementation for T: Deref. Blanket implementations are unusual for core Rust traits, but the authors of this RFC believe it's necessary in this case.

Specifically, this RFC proposes that the existing method search algorithm is modified to search the Receiver chain instead of the Deref chain.

It's therefore a major compatibility break if existing Deref implementors cease to be usable as self parameters. Just in the standard library, we'd have to add Receiver implementations for Cow, Ref, ManuallyDrop and possibly many other existing implementors of Deref: third party libraries would have to do the same. Without that, method calls on these types would not be possible:

fn main() {
    let ref_cell = RefCell::new(/* something cloneable */);
    ref_cell.borrow().clone(); // no longer possible if:
        // 1) we cease to explore Deref in identifying method candidates
        // 2) Ref doesn't implement Receiver.
}

This doesn't just break people previously using the unstable Rust arbitrary_self_type feature; it breaks stable Rust usages as well. Obviously this is not acceptable, so we believe the blanket implementation is necessary.

In any case, we think a blanket implementation is desirable:

  • It prevents Deref and Receiver having different Targets. That could possible lead to confusion if it prompted the compiler to explore different chains for these two different purposes.
  • If smart pointer type P<T> is in a crate, users of P to create P<MyConcreteType> will be able to use it as a self type for MyConcreteType without waiting for a new release of the P crate.

We found that some crates use Deref to express an is-a not a has-a relationship and so, ideally, might have preferred the option of setting up Deref and self candidacy separately. But, on discussion, we concluded that traits would be a better way to model those relationships.

Explore both Receiver and Deref chains while identifying method candidates

We could modify the method search algorithm to explore both Deref and Receiver targets when identifying method candidates. This would avoid breaking compatibility, yet would give the desired flexibility for folks who wish to implement Receiver but not Deref.

We don't think this is such a good option because:

  • It's more confusing for users;
  • It could lead to a worst-case O(n^2) number of method candidates to explore (though possibly this could be limited to O(2n) if we added restrictions);
  • It's a more invasive change to the compiler;
  • We don't know of any use-cases which the Receiver<Target=T> and blanket implementation for Deref do not allow.

If some use-case presents itself where a type must implement Deref but not Receiver; or a use-case presents itself where Deref and Receiver must have different Targets then we will have to consider this more complex option.

Generic parameter

Change the trait definition to have a generic parameter instead of an associated type. There might be permutations here which could allow a single smart pointer type to dispatch method calls to multiple possible receivers - but this would add complexity, no known use case exists, and it might cause worst-case O(n^2) performance on method lookup.

Enable for raw pointers (or Weak or NonNull)

This RFC, unlike the original Arbitrary Self Types nightly feature, does not allow raw pointer self types. We are led to believe that raw pointer receivers are quite important for the future of safe Rust, because stacked borrows makes it illegal to materialize references in many positions, and there are a lot of operations (like going from a raw pointer to a raw pointer to a field) where users don't need to or want to do that.

On the other hand, we don't want to encourage the use of raw pointers, and would prefer rather that raw pointers are wrapped in a custom smart pointer that encodes and documents the invariants.

The main problem, though, is that raw pointers have methods and Rust wants to add more methods to them in future - especially around pointer provenance. As noted in the deshadowing section, we would start to generate errors in arbitrary crates if ever we added such additional methods to raw pointers. That's clearly not OK. So, to add support for raw pointers as self types, we'd need to use a cleverer deshadowing algorithm. This is discussed in the next section, but overall has been judged to be too complicated for now.

Instead, this version of Arbitrary Self Types is as conservative as possible, such that we ought to be able to adopt such an algorithm in a future enhancement.

Pick shadowed methods instead of erroring

As explained in the deshadowing section, the Rust compiler will generate errors in case of a conflict between a method on a smart pointer and an inner type. For example:

struct Foo;
struct SmartPtr<T>(T): // implements Receiver

impl<T> SmartPtr<T> {
    fn a(self) {}
}

impl Foo {
    fn a(self: SmartPtr<Self>) {}
}

fn main() {
    let a = SmartPtr(Foo);
    a.a(); // produces an error
}

There has been extensive discussion (and prototyping) about cleverer "deshadowing" algorithms here. The current leading contender is to:

  • If there are conflicts,
    • Always pick the "inner" method;
    • Show a warning, and ask the user to disambiguate using UFC syntax (or future alternatives).

The rationale is that the author of the "inner" method is always aware of pre-existing methods on the "outer" (smart pointer) type. If a conflict arises, this means that the new method was added to the outer type, and therefore Rust can maintain existing behavior by picking the method on the inner type. (This logic falls down in the case of race conditions as crates are published, but it's broadly true.) This logic is believed to be sound, but it's counterintuitive: in all other circumstances Rust method probing works outside-in. This algorithm is also quite complex, and there's a risk of unknown unknowns.

There has also been some discussion about broader changes to method resolution in future, for example a crate-by-crate approach or even a name-resolution.lock file.

The decision has been taken, then, to restrict the current RFC to the most conserative possible version - one which errors on any conflicts, and firmly advises the creators of smart pointers to avoid adding new methods. This gives us maximum flexibility in future to allow more possibilities by relaxing some of those errors to warnings. This is a high priority primarily because of the desire to allow method calls on raw pointers (see the previous section).

Not do it

As always there is the option to not do this. But this feature already kind of half-exists (we are talking about Box, Pin etc.) and it makes a lot of sense to also take the last step and therefore enable non-libstd types to be used as self types.

There is the option of using traits to fill a similar role, e.g.

trait ForeignLanguageRef {
    type Pointee;
    fn read(&self) -> *const Self::Pointee;
    fn write(&mut self, value: *const Self::Pointee);
}

// --------------------------------------------------------

struct ConcreteForeignLanguageRef<T>(T);

impl<T> ForeignLanguageRef for ConcreteForeignLanguageRef<T> {
    type Pointee = T;

    fn read(&self) -> *const Self::Pointee {
        todo!()
    }

    fn write(&mut self, _value: *const Self::Pointee) {
        todo!()
    }
}

// --------------------------------------------------------

struct SomeForeignLanguageType;

impl ConcreteForeignLanguageRef<SomeForeignLanguageType> {
    fn m(&self) {
        todo!()
    }
}

trait Tr {
    type RustType;

    fn tm(self)
    where
        Self: ForeignLanguageRef<Pointee = Self::RustType>;
}

impl Tr for ConcreteForeignLanguageRef<SomeForeignLanguageType> {
    type RustType = SomeForeignLanguageType;
    fn tm(self) {}
}

fn main() {
    let a = ConcreteForeignLanguageRef(SomeForeignLanguageType);
    a.m();
    a.tm();
}

This successfully allows method calls to m() and even tm() without a reference to a SomeForeignLanguageType ever existing. However, due to the orphan rule, this forces every crate to have its own equivalent of ConcreteForeignLanguageRef. This workaround has been used by some interop tools, but use across multiple crates requires many generic parameters (impl ForeignLanguageRef<Pointee=SomeForeignLanguageType>).

Always use unsafe when interacting with other languages

One main motivation here is cross-language interoperability. As noted in the rationale, C++ references can't be safely represented by Rust references. Many would say that all C++ interop is intrinsically unsafe and that unsafe blocks are required. Maybe true: but that just moves the problem - an unsafe block requires a human to assert preconditions are met, e.g. that there are no other C++ pointers to the same data. But those preconditions are almost never true, because other languages don't have those rules. This means that a C++ reference can never be a Rust reference, because neither human nor computer can promise things that aren't true.

Only in the very simplest interop scenarios can we claim that a human could audit all the C++ code to eliminate the risk of other pointers existing. In complex projects, that's not possible.

However, a C++ reference can be passed through Rust safely as an opaque token such that method calls can be performed on it. Those method calls actually happen back in the C++ domain where aliasing and concurrent modification are permitted.

For instance,

struct ForeignLanguageRef<T>;

fn main() {
    let some_foreign_language_reference: ForeignLanguageRef<_> = CallSomeForeignLanguageFunctionToGetAReference();
    // There may be other foreign language references to the referent, with concurrent
    // modification, so some_foreign_language_reference can't be a &T
    // But we still want to be able to do this
    some_foreign_language_reference.SomeForeignLanguageMethod(); // executes in the foreign language. Data is not
        // dereferenced at all in Rust.
}

Even if the reader takes the view that all calls into foreign languages are intrinsically unsafe and must be marked as such, hopefully the reader would support building abstractions using the Rust type system to minimize the practical risk of undefined behavior. That's what this RFC aims to enable.

Prior art

A previous PR based on the Deref alternative has been proposed before #2362 and was postponed with the expectation that the lang team would get back to arbitrary_self_types eventually.

Future work

As discussed above we anticipate a future version which will relax some errors into warnings, and thus allow us to add support for raw pointers, Weak and NonNull as self types.

Thereafter, we could consider implementing Receiver for other types, e.g. std::cell types, std::sync types, std::cmp::Reverse, std::num::Wrapping, std::mem::MaybeUninit, std::task::Poll, and so on - possibly even for arrays, etc.

There seems to be no disadvantage to doing this - taking Cell as an example, it would only have any effect on the behavior of code if somebody implemented a method taking Cell<T> as a receiver. On the other hand, it's hard to imagine use-cases for some of these. For now, though, we should clearly restrict Receiver to those types for which there's a demonstrated need.

Feature gates

This RFC is in an unusual position regarding feature gates. There are two existing gates:

  • arbitrary_self_types enables, roughly, the semantics we're proposing, albeit in a different way. It has been used by various projects.
  • receiver_trait enables the specific trait we propose to use, albeit without the Target associated type. It has only been used within the Rust standard library, as far as we know.

Although we presumably have no obligation to maintain compatibility for users of the unstable arbitrary_self_types feature, we should consider the least disruptive way to introduce this feature.

The plan is:

  • the receiver_trait gate continues to control the existing Receiver trait used solely within the standard library, which is renamed to LegacyReceiver or FixedReceiver or something (and will be removed assuming we stabilize this feature)
  • arbitrary_self_types comes to control the new behavior, with a new Receiver trait containing a Target associated type. As noted, this does not include raw pointers, though we hope to find a way to stabilize this in a future RFC.
  • Add a new arbitrary_self_types_pointers feature gate which retains support for raw pointers.

Summary

This RFC is an example of replacing special casing aka. compiler magic with clear and transparent definitions. We believe this is a good thing and should be done whenever possible.