Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: custom nullable structs #1981

Closed
YairHalberstadt opened this issue Nov 3, 2018 · 25 comments
Closed

Proposal: custom nullable structs #1981

YairHalberstadt opened this issue Nov 3, 2018 · 25 comments

Comments

@YairHalberstadt
Copy link
Contributor

YairHalberstadt commented Nov 3, 2018

Motivation

Structs are often more performant then classes, as they improve data locality, and reduce stress on the garbage collector. Indeed in high performance code it is advised to use structs wherever possible.

In general I try to replace classes with structs, if the struct would be small enough that copying wont be an issue.

The biggest issue with doing so is that you cannot define, or prevent the default constructor. Neither can you provide default values for field or properties. As such it is impossible to make sure that a struct is initialised to a valid value before it is used, which is a major limitation for many of the cases where I try to replace a class with a struct. Indeed I would say it is the single most common reason why I do not replace a class with a struct.

There is a good reason for this. When default(T) is called, where T is a type of struct, a zeroed out struct is allocated. This is necessary for performance reasons, as the garbage collector automatically zeroes out segments it acquires.

It would be very strange for default(T) to return a valid struct which is different to the struct returned by a call to new().

Proposed Solution

This isn't an issue for classes, since a call to default(T) where T is a class returns null, and it is understood that you have to check a class for null before accessing it.

We also have the Nullable struct type, where you have to check if they are valid before using them. However you can't specify that a given struct type should be nullable.

So I propose we allow defining custom nullable structs with the following feature:

For a custom nullable struct type T

  1. T has an extra hidden bool field which indicates if the instance is null.
  2. a call to default(T) returns a null instance of T
  3. it is possible to define a default constructor for T
  4. It is possible to define default values for properties and fields of T
  5. A call on a member of a null instance of T results in a null reference exception
  6. A T[] is initialised with null Ts by default

Suggested Syntax and Example

public struct ArrayWrapper<T>?
{
    private readonly T[] _array = new T[1]; // can define default value for this field
    public T this[int index] {get => _array[index]; set => _array[index] = value;}
}

// or equivalently:

public struct ArrayWrapper<T>?
{
    private readonly T[] _array;
    public T this[int index] {get => _array[index]; set => _array[index] = value;}

    public ArrayWrapper<T>?() => _array = new T[1]; //can define default constructor
}

...

var arrayWrapper = default(ArrayWrapper<int>?);

Console.WriteLine(arrayWrapper[0]); //throws nullReferenceException. Should also be a compiler warning 

arrayWrapper = new ArrayWrapper<int>?();
Console.WriteLine(arrayWrapper[0]); //prints 0

var array = new ArrayWrapper<int>?[0];
arrayWrapper = array[0];

Console.WriteLine(arrayWrapper[0]); //throws nullReferenceException. Should also be a compiler warning 

Motivating use case

As just one example of a real life use case, consider this code I was trying to write:

public class Class
{
    public int PropertyOne {get; private set;}
    public string PropertyTwo {get; private set;}

    private Class(){}

    public struct Builder
    {
        private readonly Class _class;
        public void Initialise() => _class = new Class();
        public int PropertyOne{get => _class.PropertyOne; set => _class.PropertyOne = value;}
        public string PropertyTwo{get => _class.PropertyTwo; set => _class.PropertyTwo = value;}
        public Class Build()
        {
            var @class = _class;
            _class = new Class();
            return @class;
        }
    }
}

The problem with this pattern is that you are required to Initialise the Builder before usage, which is prone to errors. This also means you can't use object initialisers with the builder.

The alternative would be using a class, but this adds a new object to the heap for no reason, and doubles the cost of creating a new Class.

With this proposal we could write:

public class Class
{
    public int PropertyOne {get; private set;}
    public string PropertyTwo {get; private set;}

    private Class(){}

    public struct Builder?
    {
        private readonly Class _class;
        public int PropertyOne{get => _class.PropertyOne; set => _class.PropertyOne = value;}
        public string PropertyTwo{get => _class.PropertyTwo; set => _class.PropertyTwo = value;}
        public Class Build()
        {
            var @class = _class;
            _class = new Class();
            return @class;
        }
    }
}

...

var @class = new Class.Builder?
{
    PropertyOne = 17,
    PropertyTwo = "this is a string",
}.Build();

semantics relating to nullable reference types and Nullable

#1865 discusses adding nullable reference type features to nullable value types. Presumable similiar features called be added to custom nullable value types.

Open question would include whether you could refer to the custom nullable type without the ? ending when you know the value is not null.

So for example in the example given above could you declare a new Class.Builder() or only a new Class.Builder?(). Could you say Class.Builder builder = new Class.Builder?() or only Class.Builder? builder = new Class.Builder?().

@svick
Copy link
Contributor

svick commented Nov 3, 2018

I don't see how would this proposal be worthwhile, since its only purpose is to make performance optimizations easier.

And it doesn't even make them that much easier: Instead of having an Initialize() method, you could instead access the struct field through a private property that takes care of initialization:

public struct Builder
{
    private Class _class;
    private Class Class => _class ?? (_class = new Class());
    public int PropertyOne{get => Class.PropertyOne; set => Class.PropertyOne = value;}
    public string PropertyTwo{get => Class.PropertyTwo; set => Class.PropertyTwo = value;}
    public Class Build()
    {
        var @class = _class;
        _class = null;
        return @class;
    }
}

@YairHalberstadt
Copy link
Contributor Author

@svick.

In that case you've managed to work around it, but it remains that there is no general way to ensure that a struct is always in a valid state.

@svick
Copy link
Contributor

svick commented Nov 3, 2018

@YairHalberstadt What I'm saying is that it's always possible to work around it. And since we're talking about performance-critical code, some verbosity is probably acceptable.

@yaakov-h
Copy link
Member

yaakov-h commented Nov 3, 2018

The problem with this pattern is that you are required to Initialise the Builder before usage, which is prone to errors.

System.Collections.Immutable and so on seem to be managing this just fine. If you're concerned about accidents slipping into your code, have you considered writing a Roslyn Analyser?

it remains that there is no general way to ensure that a struct is always in a valid state.

You can't ensure this for a class either when Reflection and FormatterServices.GetUninitializedObject exist.

@Joe4evr
Copy link
Contributor

Joe4evr commented Nov 3, 2018

For a custom nullable struct type T [...]

  • it is possible to define a default constructor for T
  • It is possible to define default values for properties and fields of T

These things are on the table to come to all structs anyway: #99.

@HaloFour
Copy link
Contributor

HaloFour commented Nov 3, 2018

@Joe4evr

  • It is possible to define default values for properties and fields of T

These things are on the table to come to all structs anyway: #99.

That proposal doesn't define a "default value" for a struct. It only allows the default constructor to assign fields of that struct.

public struct Foo {
    public int Value;
    public Foo() {
        Value = 123;
    }
}

public class Bar {
    public Foo X;
    public Foo Y = new Foo();
}

var bar = new Bar();
Console.WriteLine(bar.X); // prints 0
Console.WriteLine(bar.Y); // prints 123

@YairHalberstadt
Copy link
Contributor Author

@Joe4evr
That proposal created a lot of controversy. This is an attempt to deal with some of its issues

@Joe4evr
Copy link
Contributor

Joe4evr commented Nov 3, 2018

That proposal doesn't define a "default value" for a struct. It only allows the default constructor to assign fields of that struct.

I took that as exactly what he meant. ¯\_(ツ)_/¯

@YairHalberstadt
Copy link
Contributor Author

@yaakov
Yes. With reflection nothing is safe. But the point here is to create a pit of success rather than a pit of failure.

We want to make the default thing you try to do correct, not to protect you from doing something.

A code analyser is not a general solution, but could work in specific cases

@YairHalberstadt
Copy link
Contributor Author

@Joe4evr @HaloFour
The point is you wouldn't be able to have an instance of Foo in your class if it wasn't initialized. It would have to be Foo?.

@HaloFour
Copy link
Contributor

HaloFour commented Nov 3, 2018

@Joe4evr

I took that as exactly what he meant. ¯_(ツ)_/¯

Maybe I misinterpreted then. I took it to mean that the fields of the struct would always be assigned.

@YairHalberstadt
Copy link
Contributor Author

By default the struct is null, and the field are unassigned. But the struct can't be used unless it is initialised via the constructor, or a null reference exception would be thrown.

@YairHalberstadt
Copy link
Contributor Author

@svick
The point is precisely that we don't want to have to limit structs to high performance code.

Currently the limitations surrounding structs mean that you only use them when you know you need performance. However, by the time you know you need performance, its often too late, as a little bit of garbage is created in every single part of the program, rather than a single function being the issue.

The Idea here is to make structs an almost drop in replacement for classes. You can switch the default from "create a class, unless a struct is necessary" to "create a struct, unless a class is necessary".

Currently C++ and Go are struct based OO languages, and they enjoy a significant performance boost as a result.

@HaloFour
Copy link
Contributor

HaloFour commented Nov 3, 2018

@YairHalberstadt

The Idea here is to make structs an almost drop in replacement for classes.

Structs will never be a drop-in replacement for classes specifically because the defining characteristic of a struct is their copy-by-value nature, which can quickly and easily eliminate any performance benefit you might get from not allocating on the heap. You can't use them interchangeably in C++ any more than you can in C#.

@ufcpp
Copy link

ufcpp commented Nov 4, 2018

#147

@theunrepentantgeek
Copy link

theunrepentantgeek commented Nov 4, 2018

Currently the limitations surrounding structs mean that you only use them when you know you need performance.

@YairHalberstadt, you might only write structs when performance is needed - other developers have other habits. Like me.

I write a lot of structs for the clarity that semantic types bring to the code - these are typically immutable value types where a struct provides a simple, clean implementation.

As such it is impossible to make sure that a struct is initialised to a valid value before it is used.

The approach I take to this is ensure the default value for a struct is sensible - either it's a legitimate value, or it will throw if misused. (As an aside - having a separate Initialize() method is something I'd call out in any code review - it makes the type harder to use correctly.)

For the first case, check out this Specifiable<T> class that a colleague of mine write to facilitate some testing. The default value of a Specifiable<string> is unspecified; you can specify a value using a factory method: Specify.As("Foo");.

For the later case, the approach I'd take involves a couple of steps.

First, define a private EnusureInitialized method that throws if the struct is invalid:

private readonly bool _initialized;

[Conditional("DEBUG")]
private void EnsureInitialized() 
{
    if (!initialized)
    {
        throw new InvalidOperationException("Use of uninitialized struct Foo");
    }
}

Next, call the method at the start of every public member and you'll soon pick up any context where you're using the struct inappropriately.

The Conditional attribute ensures the code is removed from a release build, so you get full performance.

@ufcpp
Copy link

ufcpp commented Nov 4, 2018

see also: Roles

@YairHalberstadt
Copy link
Contributor Author

@ufcpp.

It would be nice to merge #147 and this proposal, so that you could define default constructor for a nullable like type.

@benaadams
Copy link
Member

Nullable structs are already a thing? https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/nullable-types

@Austin-bryan
Copy link

Currently C++ and Go are struct based OO languages, and they enjoy a significant performance boost as a result.

C++ performance boost has nothing to do with being a struct based language, but the fact that it gives a lot more tools to the dev to allow it to do more work. Even if you used mostly classes in C++, and are a good C++ programmer, the code will be faster than C#, for many other various reasons, you can't limit it to just one, and espically not the it being "struct based" (which I couldn't find that term anywhere, and I've never heard of anyone describe it like that).

So if I'm not mistaking, this has nothing to do with nullable types? Because you can do this:

struct MyStruct { }

and then MyStruct? exists by default. You seem to be asking to create MyStruct? such that there is no non-null MyStruct, and they have to use the nullable version. If that's the case, I think you should really just use a class at this point. This is pretty intense micro-optimizing.

@YairHalberstadt
Copy link
Contributor Author

YairHalberstadt commented Nov 5, 2018

@Willard720
Classes, (in the C# sense) don't exist in c++. All objects are valuetypes, and you have to explicitly take a pointer to an object.

This makes a huge impact on performance.

Firstly, short lived objects are almost never created on the heap, but live on the stack. So much so that go doesn't bother with a generational garbage collector, as there's very few gen0 objects to collect.

From https://blog.golang.org/ismmkeynote

It isn't that the generational hypothesis isn't true for Go, it's just that the young objects live and die young on the stack. The result is that generational collection is much less effective than you might find in other managed runtime languages.

Secondly it means heap allocated arrays of objects have much better data locality. See http://joeduffyblog.com/2010/10/31/dense-and-pointerfree/

See also this quote from http://joeduffyblog.com/2010/09/06/the-premature-optimization-is-evil-myth/

If I could do it all over again, I would make some changes to C#. I would try to keep pointers, and merely not allow you to free them. Indirections would be explicit. The reference type versus value type distinction would not exist; whether something was a reference would work like C++, i.e. you get to decide. Things get tricky when you start allocating things on the stack, because of the lifetime issues, so we’d probably only support stack allocation for a subset of primitive types. (Or we’d employ conservative escape analysis.) Anyway, the point here is to illustrate that in such a world, you’d be more conscious about how data is laid out in memory, encouraging dense representations over sparse and pointer rich data structures silently spreading all over the place. We don’t live in this world, so pretend as though we do; each time you see a reference to an object, think “indirection!” to yourself and react as you would in C++ when you see a pointer dereference.

@HaloFour
Copy link
Contributor

HaloFour commented Nov 5, 2018

@YairHalberstadt
Copy link
Contributor Author

YairHalberstadt commented Nov 5, 2018

Thanks @HaloFour for those links. Very interesting.

Nonetheless, that does not invalidate the advantages of creating structs on the stack at compile time, compared to creating objects which may possibly be stack allocated by the JIT if the analysis is simple enough to do so.

@HaloFour
Copy link
Contributor

HaloFour commented Nov 5, 2018

@YairHalberstadt

Nonetheless, that does not invalidate the advantages of creating structs on the stack at compile time, compared to creating objects which may possibly be stack allocated by the JIT if the analysis is simple enough to do so.

Nothing prevents you from doing so. But you inherit with that all of the disadvantages of structs, including no default value, no mechanism to enforce initialization, limited lifetime, copy semantics and very strict reference semantics. Structs aren't some magic silver bullet. They can't replace classes in the majority of cases simply because of the lifetime requirements.

Even in the blog post you cited stack allocation was called out as being tricky, to the extend that his reimagined C# wouldn't allow it at all except for primitive types or escape analysis.

@YairHalberstadt
Copy link
Contributor Author

Closed in favour of #2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants