Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding ref support #7555

Merged
merged 4 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions proposals/csharp-11.0/low-level-struct-improvements.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Low Level Struct Improvements
=====
====

[!INCLUDE[Specletdisclaimer](../speclet-disclaimer.md)]

Expand Down Expand Up @@ -904,6 +904,12 @@ Lifetime variables when defined on types can be invariant or covariant. These ar
ref struct S<out $this, $a>
```

The lifetime parameter `$this` on type definitions is _not_ predefined but it does have a few rules associated with it when it is defined:
- It must be the first lifetime parameter.
- It must be covariant: `out $this`.
- The lifetime of `ref` fields must be convertible to `$this`
- The `$this` lifetime of all non-ref fields must `$heap` or `$this`.
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

The lifetime of a ref is expressed by providing a lifetime argument to the ref. For example a `ref` that refers to the heap is expressed as `ref<$heap>`.

When defining a constructor in the model the name `new` will be used for the method. It is necessary to have a parameter list for the returned value as well as the constructor arguments. This is necessary to express the relationship between constructor inputs and the constructed value. Rather than having `Span<$a><$ro>` the model will use `Span<$a> new<$ro>` instead. The type of `this` in the constructor, including lifetimes, will be the defined return value.
Expand All @@ -912,10 +918,12 @@ The basic rules for the lifetime are defined as:

- All lifetimes are expressed syntactically as generic arguments, coming before type arguments. This is true for predefined lifetimes except `$heap` and `$local`.
- All types `T` that are not a `ref struct` implicitly have lifetime of `T<$heap>`. This is implicit, there is no need to write `int<$heap>` in every sample.
- For a ref field defined as `ref T<$l1, $l2, ... $ln>` all lifetimes `$l1` through `$ln` must be invariant.
- For a ref defined as `ref<$a> T<$b, ...>`, `$b` must convertible to `$a`
- The `ref` of a variable has a lifetime of the
- For a ref local, parameter, field or return of type `ref<$a> T` the lifetime is `$a`
- For a `ref` field defined as `ref<$l0> T<$l1, $l2, ... $ln>`:
- All lifetimes `$l1` through `$ln` must be invariant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true even for a ref readonly field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that readonly could allow us to relax the restriction a bit here. At the same time it's something we'd have to really map out to make sure we weren't missing anything.

- The lifetime of `$l0` must be convertible to `$this`
- For a `ref`` defined as `ref<$a> T<$b, ...>`, `$b` must convertible to `$a`
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
- The `ref` of a variable has a lifetime defined by:
- For a `ref` local, parameter, field or return of type `ref<$a> T` the lifetime is `$a`
- `$heap` for all reference types and fields of reference types
- `$local` for everything else
- An assignment or return is legal when the underlying type conversion is legal
Expand All @@ -927,11 +935,6 @@ For the purpose of lifetime rules a `ref` is considered part of the type of the

Next let's define the rules that allow us to map C# syntax to the underlying model.

The lifetime parameter `$this` on type definitions is _not_ predefined but it does have a few rules associated with it when it is defined:
- It must be the first lifetime parameter.
- It must be covariant: `out $this`.
- The lifetime parameters of all non-ref fields, and the ref lifetime of ref fields, must be `$this`

For brevity sake a type which has no explicit lifetime parameters treated as if there is `out $this` defined and applied to all fields of the type. A type with a `ref` field must define explicit lifetime parameters.

These rules exists to support our existing invariant that `T` can be assigned to `scoped T` for all types. That maps down to `T<$a, ...>` being assignable to `T<$local, ...>` for all lifetimes known to be convertible to `$local`. Further this supports other items like being able to assign `Span<T>` from the heap to those on the stack. This does exclude types where fields have differing lifetimes for non-ref values but that is the reality of C# today. Changing that would require a significant change of C# rules that would need to be mapped out.
Expand Down
294 changes: 294 additions & 0 deletions proposals/expand-ref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
Expanding ref support
===

## Summary
This proposal expands the capabilities of `ref` and `scoped` in the language. The goal being to leverage the existing types of rules in the model to allow `ref struct` usage in more locations and provide more lifetime expressiveness for APIs.

## Motivation
There are still a number of scenarios around `ref` which cannot be safely expressed in the language. These are generally when using multiple mutable `ref struct` parameters where many are passed by `ref` or when trying to use `ref struct` in `ref` fields.

To _fully_ satisfy all of these scenarios would require us to introduce explicit lifetime parameters and relationships into the language. That is a _huge_ investment that is not yet motivated by need. Instead this proposal takes our existing lifetime annotation, `scoped`, and sees how much further `ref` safety can be taken without introducing any other annotations or keywords.

This doesn't solve all scenarios but does remove several known friction points in the language. It also serves to show us exactly where the limits are without introducing explicit lifetime parameters.

## Detailed Design
The rules for `ref struct` safety are defined in the following documents:

- [ref safety proposal](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's hard to get use out of this document now because you have to have the other document at hand to cross-reference everything that has changed in later versions of C#.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I do need to put together a unified doc that we can reference.

- [ref fields proposal](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/low-level-struct-improvements.md)

This proposal will be building on top of those previous ones.

The more detailed rules will rely on the [annotation syntax](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/low-level-struct-improvements.md#annotations) to describe the detailed rules. This is the most direct way to discuss how syntax behaves in the greater model. Readers interested in the very low level details should familiarize themselves with that syntax before digesting this proposal.

### ref scoped parameters
The language will allow for parameters to be declared as `ref scoped`. This will serve to constrain the _safe-to-escape_ of the value such that it cannot be returned from the current method.

```csharp
Span<int> M(Span<T> p1, ref scoped Span<int> p2)
{
// Error: cannot returned scoped value
return span;
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

// Error: the safe-to-escape of p1 is not convertible to p2.
p2 = p1;

// Okay: heap can always be assigned
p2 = default;

// Okay
p2[0] = 42;
}
```

This capability will help cases where multiple `ref struct` values with different lifetimes are passed by `ref`. Having `ref scoped` allows developers to note which values do not escape and that allows for more call site flexibility.

```csharp
ref struct Data { ... }
void Copy1(ref Data source, ref Data dest) { ... }
void Copy2(ref Data source, ref scoped Data dest) { ... }

void Use(ref Data data)
{
// STE: current method
var local = new Data(stackalloc int[42]);

// Error: compiler has to assume local copied to data
Copy1(ref data, ref local);

// Okay: compiler knows lifetime only flows data -> local
Copy2(ref data, ref local);
}

```

This is accomplished by giving every `ref scoped` parameter a new escape scope named _current parameter N_ where _N_ is the numeric order of the parameter. For example the first parameter has a _safe-to-escape_ of _current parameter 1_. An escape scope of _current parameter N_ can be converted to _current method_ but has no other defined relationship. That serves to restrict their usage to the current method.

It's important to note each parameter has a different _current parameter N_ scope. That means they cannot be assigned to each other. This is necessary to prevent `ref scoped` parameters from returning each others data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a critical concept. The implementation currently, by representing lifetimes as uint, requires every lifetime to be comparable to every other lifetime. It's equivalent to it being impossible to declare void M<T, U>() {} without also saying where T : U or where U : T.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to do something similar for value-scoped parameters, once we permit RTRS fields? Since it feels like it will be possible to declare a similar signature, just using void Swap(RefSpanWrapper<int> p1, RefSpanWrapper<int> p2).

Copy link
Member Author

@jaredpar jaredpar Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree our current implementation does not lend itself to this type of validation. This was one of the motivations I had in wanting us to refactor the implementation significantly in C# 12.0 to better match the existing spec. That would've made the transition to this much easier. As is though we effectively need to do that work before we can do this feature.


```csharp
void Swap(ref scoped Span<int> p1, ref scoped Span<int> p2)
{
// Error: can't assign current parameter 2 to current parameter 1
p2 = p1;

// Error: can't assign current parameter 1 to current parameter 2
p1 = p2;

// Okay: as current parameter 1 and 2 can be converted to current method
scoped Span<int> local1 = p1;
scoped Span<int> local2 = p2;

// Okay: however the safe-to-escape here is current parameter X, not
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
// current method so this could cause a bit of confusion later on
Span<int> local3 = p1;
Span<int> local4 = p2;

// Okay: the safe-to-escape of the value is inferred in this case as it is
// done for ref locals today.
ref Span<int> refLocal1 = ref p1;
ref Span<int> refLocal2 = ref p2;
}
```

A `ref scoped` parameter is also implicitly `scoped ref`. That means neither the value nor it's `ref` can be returned from the method. Both `ref` and `in` parameters can have their values modified with `scoped`. An `out` parameter cannot have its value modified with `scoped` as such a declaration is non-sensical.
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

```csharp
void M(
ref scoped Span<int> p1, // Okay
in scoped Span<int> p2, // Okay
out scoped Span<int> p2, // Error
)
```

The [method arguments must match](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/low-level-struct-improvements.md#rules-method-arguments-must-match) rules will be updated to take `ref scoped` into account. Values passed to such parameters do not need to be considered when calculating the return scopes.

Detailed notes:
- A `ref scoped` parameter is implicitly `scoped ref`
- A `out scoped` parameter declaration is an error
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

### ref field to ref struct
The language will allow for `ref struct` to appear as `ref scoped` fields. This `scoped` will serve to ensure the values cannot be escaped outside the containing instance but can be read and manipulated within it.

```csharp
ref struct Deserializer
{
ref scoped Utf8JsonReader reader;

ReadOnlySpan<byte> M1()
{
// okay: implicitly scoped to current method
var span = reader.ValueSpan;

// okay
reader.Skip();

// Error: can't escape the ref data the ref scoped field refers to
return reader.ValueSpan;
}
}
```

This is accomplished by giving every `ref scoped` field a new two new escape scopes named _current field N_ and _current ref field N_ where _N_ is the numeric order of the field. For example the first field has a _safe-to-escape_ of _current field 1_ and a _ref-safe-to-escape_ of _current ref field N_. Both escape scopes can be converted to _current method_, and _current field N_ can be converted to _current ref field N_, but no other defined relationships exist. That serves to restrict their usage to the current method where the containing value is used. This escape scope applies to both
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all lifetimes $current_field_N and $current_ref_field_N must also be convertible to lifetime $this. In the absence of this constraint, it would be possible for the this variable to hold references to fields with narrower lifetimes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, depending on how you look at it, it simply wouldn't be possible to declare any 'ref scoped' fields in ref structs, because the field's lifetime would not be convertible to the lifetime of the containing 'this' variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Logically yes I agree that the lifetimes must be convertible to $this. The moment you allow that though in all contexts, suddenly it's not scoped anymore, it's returnable

struct S<$this, $refField1, $field1>
  where $refField : $this
  where $field1: $this, $refField1 {
  ref<$refField1> Span<$field1, byte> span;
}

Span<$cm> M<$cm, $l1, $l2>(S<$cm, $l1, $l2> s) => 
  // works because $l2 is convertible to $cm
  s.span;
}

The way this works is that at construction time the rules ensure that the lifetimes of both the ref and the value are convertible to $this. That just falls out from our existing method args rules. Given that the fields lifetimes are invariant there is no need to keep that relationship alive for correctness after that. The only thing we have to worry about is how they are interpreted in the context of a method.

Note: if these were ref unscoped fields then you are 100% correct we'd put that relationship in place at the decl of the type.


Below are a few examples of these rules in action

```csharp
ref struct NestedRefStruct { }
ref struct RefStruct
{
public NestedRefStruct NestedField;
}

ref struct S
{
ref scoped RefStruct field;

RefStruct M1(RefStruct s)
{
// Okay
field = new();

// Error: calling-method is not convertible to current-field-1 as they have
// no relationship
field = s;

// Error: safe-to-escape is current-field-1 which isn't returnable
return field;
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
}

NestedRefStruct M2()
{
// Error: safe-to-escape is current-field-1 which isn't returnable
return field.NestedField;
}

ref RefStruct M3()
{
// Error: safe-to-escape is current-ref-field-1 which isn't returnable
return ref field;
}
}
```

The [method arguments must match](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/low-level-struct-improvements.md#rules-method-arguments-must-match) rules do not need to be updated here as they already account for `ref` parameters being captured as `ref` field. Even though a `ref` to `ref struct` was not directly returnable before, it could be returned indirectly by a `ref` to a `struct` field of the value.

The language will also allow for `ref` fields to be declared as `scoped ref`. There are less use cases for this but `ref scoped` implies `scoped ref` hence the rules must be adjusted to account for this. As such the syntax will be exposed because while the use cases are small the infrastructure already exists. The _ref-safe-to-escape_ of such fields follows the logic above for `ref scoped` fields.

Detailed notes:
- A `ref` field where the type is a `ref struct` must be `ref scoped`
- A `ref` field may be marked `scoped ref`

### Sunset restricted types
The ability for any type to be a `ref` field allows us to fully sunset the notion of restricted types. The compiler has a concept of a set of _restricted types_ which is largely undocumented. These types were given a special status because in C# 1.0 there was no general purpose way to express their behavior. Most notably the fact that the types can contain references to the execution stack. Instead the compiler had special knowledge of them and restricted their use to ways that would always be safe: disallowed returns, cannot use as array elements, cannot use in generics, etc ...

Once `ref` fields are available and extended to support `ref struct` these types can be fully rationalized within those rules. As such the compiler will no longer have the notion of restricted types when using a language version that supports `ref` fields of `ref struct`.

To support this our `ref` safety rules will be updated as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume updates to the following don't indicate anything other than clarity on their semantics rather than changing anything about their "officially supported" state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is just about getting clarity on the semantics. Particularly I know the runtime team has wanted to leverage TypedReference more and this would provide clarity on what can and cannot be done safely.


- `__makeref(e)` will be logically treated as a method with the signature `static TypedReference __makeref(ref T value)` were `T` is the type of `e`.
- `__refvalue(e, T)`
- When `T` is a `ref struct`: will be treated as accessing a field declared as `ref scoped T` inside `e`.
- Will be treated as accessing a field declared as `ref T` inside `e`
- `__arglist` as a parameter will be implicitly `scoped`
- `__arglist(...)` as an expression will have a *ref-safe-to-escape* and *safe-to-escape* of *current method*.

Conforming runtimes will ensure that `TypedReference`, `RuntimeArgumentHandle` and `ArgIterator` are defined as `ref struct`. Further `TypedReference` must be viewed as having a `ref` field to a `ref struct` for any possible type (it can store any value). That combined with the above rules will ensure references to the stack do not escape beyond their lifetime.

Note: strictly speaking this is a compiler implementation detail vs. part of the language. But given the relationship with `ref` fields it is being included in the language proposal for simplicity.

### Annotation Definition

<a name="annotations-param"></a>
At an annotation level every parameter marked `ref scoped` will have a new lifetime parameter defined. The name will be `$paramN` where _N_ is the numerical order of the parameter. That lifetime will only have the relationship `where $paramN : $local`.

```csharp
ref struct S { }
void M(ref scoped S s)

// maps to

void M<$param1>(ref<$local> S<$param1> s)
where $param1 : $cm
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
```

This definition prevents the value from escaping from the method as the lifetime is not returnable. It also prevents local data from escaping from the current method through the parameter as the lifetime is wider that `$local` but not equivalent.
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

```
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
void M<$param1>(ref<$local> S<$param1> p)
where $param1 : $local
{

S<$local> s = new S<$local>(stackalloc int[42]);

// error: cannot convert S<$local> to S<$param1>
p = s;
}
```

<a name="annotations-field"></a>
At an annotation level every field marked `scoped ref` (explicitly or implicitly via `ref scoped`)will have a new lifetime parameter defined. The name will be `$refFieldN` where _N_ is the numerical order of the field. That lifetime will have the relationship `where $refFieldN : $local` in all methods that use the type.
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

```csharp
ref struct S
{
scoped ref int i;
}
S M(S p) { }

// maps to
ref struct S<out $this, $refField1>
{
ref<$refField1> int i;
}

S<$cm> M<$cm, $l1>M(S<$cm, $l1> p)
jaredpar marked this conversation as resolved.
Show resolved Hide resolved
where $l1 : $local
{

}
```

Every field marked as `ref scoped` will have a new lifetime parameter defined. Th name will be `$fieldN` where _N_ is the numerical order of the field. That lifetime will have the relationship `where $fieldN : $refFieldN` defined on the type. It will also have the relationship `where $fieldN : $local` in all method that use the type.
jaredpar marked this conversation as resolved.
Show resolved Hide resolved

```csharp
ref struct S1 { }
ref struct S2
{
ref scoped S1 field;
}

S2 M(S2 p) { }

// maps to
ref struct S1<out $this> { }
ref struct S2<out $this, $refField1, $field1>
where $field1 : $refField1
{
ref<$refField1> S1<$field1> field;
}

S1<$cm, $l1, $l2> M<$cm, $l1>M(S<$cm, $l1, $l2> p)
where $l2 : $l1
where $l1 : $local
{

}
```

These definitions prevent the values (`ref` or value) from escaping as their lifetimes are never returnable. It does allow for them to be manipulated and adjusted though. Non `ref` data, or data known to have `$heap` lifetime, can be assigned into such fields.

## Open Issues

### Ability to mark this as ref scoped
The proposal does not provide any way to mark `this` as `ref scoped` for a given method. At this time the author can see no significant benefits to this. If such scenarios do come along then an attribute such as `[RefScoped]` could be introduced similar to how `[UnscopedRef]` works.

### Requiring ref fields to ref struct to be scoped
Certain readers are likely to be disappointed that `ref` field to `ref struct` must be `ref scoped`. That limits the number of scenarios which can assign `ref` data into such fields.

This is unfortunately necessary given the constraints of the design. Having a plain `ref` effectively requires that explicit lifetime annotations exist in the language. There is no other way to safely express the relationship between the value and the container.