Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# Design Notes for Apr 1 and 8, 2015 #2119

Closed
MadsTorgersen opened this issue Apr 20, 2015 · 5 comments
Closed

C# Design Notes for Apr 1 and 8, 2015 #2119

MadsTorgersen opened this issue Apr 20, 2015 · 5 comments

Comments

@MadsTorgersen
Copy link
Contributor

C# Design Meeting Notes for Apr 1 and Apr 8, 2015

Agenda

Matt Warren wrote a Roslyn analyzer as a low cost way to experiment with nullability semantics. In these two meetings we looked at evolving versions of this analyzer, and what they imply for language design.

The analyzer is here: https://github.com/mattwar/nullaby.

Flow-based nullability checking

At the design review on Mar 25 (#1921), there was strong support for adding nullability support to the language, but also the advice to make it easy to transition into using nullability checking by recognizing current patterns for null checking.

This suggests a flow-based approach, where the "null state" of variables is tracked by the compiler, and may be different in different blocks of code. Comparisons of the variable, and assignments to the variable, would all change its null state within the scope of effect of those operations.

An inherent danger with flow-based checks like that is that a variable may change in an untracked way. The risk of that for parameters and local variables is limited: the variable would have to be captured and modified by a lambda, and that lambda would have to be executed elsewhere during the running of the current function. Given that any null-checking machinery we build would have to be somewhat approximate anyway, we can probably live with this risk.

It gets gradually worse if we try to track fields of this or other objects, or even properties or array elements. In all likelihood, tracking just parameters and local variables would deliver the bulk of the value, but at least "dotted chains" would certainly also be useful.

Attributes versus syntax

The analyzer makes use of attributes to denote when a variable is "nullable" ([CouldBeNull]) or "non-nullable" ([ShouldNotBeNull]). Compared to a built-in syntax, this has several disadvantages:

  • It's less syntactic convenient of course
  • The attribute cannot be applied to local variables
  • The attribute cannot be applied to a type argument or an array element type

These limitations are inherent to the nature of the experiment, but we know how to counter them if we add language syntax, even if that syntax is encoded with the use of attributes. (We learned all the tricks when we introduced dynamic.)

Analyzers versus built-in rules

Providing nullability diagnostics by means of an analyzer comes with a number of pros and cons compared to having those diagnostics built in to the language.

  • Language rules need to be clearly specified and reasonable to explain. An analyzer can employ more heuristics.
  • Language-based diagnostics need to consider back compat for the language, whereas analyzers can introduce warnings on currently valid code.
  • For the same reason, analyzers can evolve over time
  • With an analyzer, individual rules can be turned on and off. Some may want a harsher medicine than others. With the language it has to be one size fits all.

On the other hand:

  • Rules can probably be implemented more efficiently in the compiler itself (though we might be able to come up with tricks to deal with that for analyzers too)
  • The language has an opportunity to standardize what exactly is allowed
  • The rules would still apply in contexts where analyzers aren't run
  • It would be odd to add ! and ? syntax to the language, without adding the accompanying semantics
  • We could avoid many issues with back compat if we adopt the notion of "warning waves" (Warning Waves #1580), where later versions of the language can add new warnings.

The analyzer has several shortcomings due to the lack of a public flow analysis API in Roslyn. That would be a great addition, regardless of what we do for nullability checking.

With a cruder/simpler analysis built-in to the language, you can imagine folks building enhancement analyzers on top of it. Those may not just add new diagnostics, but might want to turn off compiler warnings where more heuristics can determine that they are in fact not warranted. The analyzer infrastructure doesn't currently support this.

Taking the analyzer for a spin

Given the following declaration

void Foo([ShouldNotBeNull] string s) { }

The following statements would yield warnings because s is declared to be nullable, but is used in a way that requires it not to be:

void Bad([CouldBeNull] string s)
{
    Foo(s);           // Warning!
    var l = s.Length; // Warning!
}

However, all the following methods are ok, because the flow analysis can determine that s is not null at the point where it is used:

void Ok1([CouldBeNull] string s)
{
    s = "Not null";
    Foo(s); // Ok
}
void Ok2([CouldBeNull] string s)
{
    if (s != null)
    {
        Foo(s); // Ok
    }
}
void Ok3([CouldBeNull] string s)
{
    if (s == null)
    {
        throw new ArgumentNullException();
    }
    Foo(s); // Ok
}
void Ok4([CouldBeNull] string s)
{
    if (s == null)
    {
        s = "NotNull";
    }
    Foo(s); // Ok
}
void Ok5([CouldBeNull] string s)
{
    if (s != null && s.Length > 0) // Ok
    {
    }
}
void Ok6([CouldBeNull] string s)
{
    if (s == null || s.Length > 0) // Ok
    {
    }
}

This seems hugely useful, because current code will just continue to work in the vast majority of cases.

It is a change of thinking from where nullability is strongly part of the type of a variable, and is established at declaration time. In that paradigm, establishing that a given variable is not null doesn't help you; you have to capture its value in a new, more strongly typed variable, which is more cumbersome, and which would require existing code to be extensively rewritten to match new patterns.

Conclusion

Matt's experiment is great, and we are very interested in the nullability tracking approach, because it has the potential to make the bulk of existing code work in the face of new annotations.

@paulomorgado
Copy link

Would this be useful beyond simple cases like this?

If the parameter/variable has a deeper structure only the "root" can be guaranteed to be null or not null.

@Sebazzz
Copy link

Sebazzz commented Apr 22, 2015

In (Microsoft) code bases we often see code like this:

Check.NotNull("myParameter", myParameter);

How would this work with flow analysis? Would it follow through methods? Across assembly boundaries? Into precompiled / binary code? Or would you support a contract annotation like ReSharpers StringAnnotation?

@controlflow
Copy link

but at least "dotted chains" would certainly also be useful

Every person working on program analysis should know dataflow analysis is orthogonal to state analysis. Pointer aliasing complicates things a lot, aliasing problems can only be solved via some kind of heap modelling like 'shape analysis'. Even ref/out parameters are problematic to analyze in a sound way because of possible aliasing. Impurity requires intra-procedural analysis to track state mutations.

In ReSharper the analysis of "dotted chains" is solved via a lot heuristics (like dropping knowledge of dotted chains if some method is called) and it kind-a works, but produces false positives sometimes.

@dmitriyse
Copy link

I have spent some time researching non nullable reference types and i remember the CodeContracts ugly implementation. I have found more general variant of solution (may be just way to think to) for both problems: 1) nullability and 2) contracts.
Imagine, that we have new compiler-only construct:

public constraint class NonZeroLengthString: String
{
    public bool InvariantConstraints()
    {
         return this != null && this.Length > 0; // Analyzed by compile-time inference engine..
    }
}

then we can define C# 7.0 signature:

public delegate NonZerLengthString DoSomething(object obj);

It is compiled to CLR code:

[return: CS7Constraint["NonZeroLengthString"]]
public delegate  string DoSomething(object obj);

More examples:

public constraint class NonNullReference<T>: class T
{
    public bool IvariantConstraints()
    {
            return this != null;
    }
}

public class Foo
{
      public  Foo(NonNullReference<string> arg1)
      {
      }
      // The same signature, but with C# 7.0 syntax sugar.
      public void Bar(string! arg1)
      {
      }
}

// C# 7.0 strict mode (all reference types are non nullable by default)
strict public class Foo
{
      public  Foo(string arg1, string? canBeNullVariable)
      {
      }
}

This approach may be a first step on injecting powerful static code check and language built-in inference engine. All variables can be fulfilled with additional semantic and language inference engine can help to protect developers from their mistakes.

In short words this approach can be called: Variables semantic markup

@gafter gafter modified the milestone: C# 7 and VB 15 Nov 21, 2015
@gafter
Copy link
Member

gafter commented Apr 25, 2016

Design notes have been archived at https://github.com/dotnet/roslyn/blob/future/docs/designNotes/2015-04-01%2C8%20C%23%20Design%20Meeting.md but discussion can continue here.

@gafter gafter closed this as completed Apr 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants