Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse lambda expressions with explicit return type #53827

Merged
merged 6 commits into from
Jun 10, 2021

Conversation

cston
Copy link
Member

@cston cston commented Jun 2, 2021

Proposal: lambda-improvements.md
Test plan: #52192

As a perf test, parsed the 1600 C# files beneath src/Compilers/CSharp, with and without this change. Reviewing in perfview, there seems to be no real difference in time spent in ParseText(), and the difference in total allocations within ParseText() was 0.1%.

if (this.IsPossibleLambdaExpression(precedence))
{
return this.ParseLambdaExpression();
}
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting ambiguity. if we have ref x () => is that a ref to this lambda, or a lambda that returns by-ref. Given that the former seems fairly meaningless semantically (correct me if i'm wrong), it seems safe to presume this is the latter. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comment on this what case this is matching?

if (foundParameterModifier)
{
return true;
}
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened here? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was previously used to treat (ref x, as the start of an explicitly typed lambda. If we also allow those cases when there is an explicit return type, we'd treat F(ref x, as the start of a lambda (with return type F) rather than a call to F(). The simplest change seemed to be look ahead for ) => in all cases as we were doing for cases with parameters with no modifiers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, i'd put a comment to this effect in this method to help make it clearer for maintenance.

this.Reset(ref resetPoint);
this.Release(ref resetPoint);

return isAsync;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i skimmed. i def need to review this in depth later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed in depth. i think this is correct. however, it is definitely on the knife's edge of being really hard to maintain and get right. this approach does have the virtue of being in line with how we did it before, with very little deviation, sot hat's a plus.

However, we could consider tearing this down (now or int he future) and just syaing: this code will presume that what it is seeing is a lambda, but will require the => to follow. And, if it doesn't have that, it's not a lambda. That would likely make this a ton simpler, and would allow us to not have to do all the contortions around identifying and and considering all the ambiguity cases. In a scannign world that was necessary. In a parsing world it is not (though we might generate more garbage in common cases) :-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider tearing this down ...

Added comment to #53976 for now.

}
}
EOF();
}
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add test for thsi case: RT LambdaName(int arg) => { ... }. It's not legal, but people will likely try to write this out.

Note that we might want to make this legal at some point int eh future so we can have recursive lambdas. But that's obviously a future issue. The purpose of the test here is to juts validate that we fail and that hopefully we don't go too off the rails. #Resolved

@cston cston force-pushed the return-type branch 3 times, most recently from 542c0dd to 9c480b1 Compare June 8, 2021 05:16
@cston cston changed the base branch from main to features/lambdas June 8, 2021 16:28
@cston cston marked this pull request as ready for review June 8, 2021 16:29
@cston cston requested a review from a team as a code owner June 8, 2021 16:29
@jcouv
Copy link
Member

jcouv commented Jun 8, 2021

        string source = modifiers + " delegate*<void> () => default";

Consider adding a more complicated function pointer scenario (with calling convention and such)


In reply to: 856977525


In reply to: 856977525


In reply to: 856977525


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:433 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

@jcouv
Copy link
Member

jcouv commented Jun 8, 2021

        verify();

nit: LangVer will be a semantic check?


In reply to: 856977602


In reply to: 856977602


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:43 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

@jcouv
Copy link
Member

jcouv commented Jun 8, 2021

            N(SyntaxKind.AsyncKeyword);

Should the second async be an identifier?


In reply to: 856977697


In reply to: 856977697


In reply to: 856977697


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:2990 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

break;
}

if (this.CurrentToken.Kind == SyntaxKind.EndOfFileToken)
Copy link
Member

@333fred 333fred Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this bailout? #Closed

Copy link
Member Author

@cston cston Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were tracking whether we saw a ref, out, or in modifier to return true early without scanning for a =>, and for this case when the partially complete parameter list with a modifier was at the end of the file. To handle any of these cases now, we'd need the caller to pass in whether there was an explicit return type because F (ref x, may be either a lambda with return type F or an invocation of method F. It didn't seem worth the complexity to keep the early return optimization, and the case here, of a lambda with parameters that have modifiers at the end of the file, seemed unlikely.

if (this.CurrentToken.Kind == SyntaxKind.EqualsGreaterThanToken)
{
// 1. `static =>`
// 2. `async static =>`
Copy link
Member

@333fred 333fred Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a case for static async #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you're handling it below. Consider mentioning that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've only checked for static and async static above so static async is not applicable here. Will leave as is since the comment and the preceding checks were essentially unchanged.

@@ -11231,14 +11227,14 @@ private ExpressionSyntax ParseCastOrParenExpressionOrLambdaOrTuple(Precedence pr
{
Debug.Assert(this.CurrentToken.Kind == SyntaxKind.OpenParenToken);

if (IsPossibleLambdaExpression(precedence) && this.TryParseLambdaExpression() is { } lambda)
Copy link
Member

@jcouv jcouv Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider moving this check to the place that calls ParseCastOrParenExpressionOrLambdaOrTuple. Seems more regular with other cases there. #Closed

@jcouv
Copy link
Member

jcouv commented Jun 8, 2021

        string source = "F(a, b) => { }";

Consider adding a test for F(a, b) => and F(a, b) => {. Is the fat arrow sufficient to tip over the interpretation?


In reply to: 856993252


In reply to: 856993252


In reply to: 856993252


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:2616 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

@@ -9768,7 +9772,7 @@ private bool IsPossibleExpression(bool allowBinaryExpressions, bool allowAssignm
// expression (whether it is used as an identifier or a keyword).
return this.IsTrueIdentifier() || (this.CurrentToken.ContextualKind == SyntaxKind.FromKeyword);
default:
return (IsPredefinedType(tk) && tk != SyntaxKind.VoidKeyword)
return IsPredefinedType(tk)
Copy link
Member

@jcouv jcouv Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what's a scenario affected by this change? #Closed

Copy link
Member Author

@cston cston Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows void at the start of an expression - for instance in the argument value to [A(void () => { })] class C { }.

// something lambda-like like:
//
// async a => ... // or
// async (a) => ...
Copy link
Member

@333fred 333fred Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what contexts are either of these not a lambda? #Closed

Copy link
Member Author

@cston cston Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what contexts are either of these not a lambda?

It seems the existing comment is saying these are more likely to be lambdas than the example below.

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review pass (iteration 3)

//
// Or we could have something that isn't a lambda like:
//
// async ();
Copy link
Member

@333fred 333fred Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad code, right? Why would we assume that it isn't a lambda with a missing body for error recovery? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async (); could be a method call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. and this is a case where @jaredpar is correct. if we can presume async is a keyword, this becomes like 10x simpler minimum. it's really this ambiguity which forces the insanity here.

note: a potentially cleaner approach here (since we already ahve the reset point) is to simply assume that async is the keyword, and just ensrue we get to the =>. if so, it's a lmbda. if not, we bail out.

case SyntaxKind.IdentifierToken:
case SyntaxKind.StaticKeyword:
case SyntaxKind.RefKeyword:
case SyntaxKind.DelegateKeyword:
Copy link
Member

@333fred 333fred Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? What if it's followed by a *? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is just checking if async is in a location where it should be considered an async keyword in a lambda expression rather than an identifier. async delegate*... is a such a case.

}
}
}
}
M(SyntaxKind.SemicolonToken);
}
N(SyntaxKind.LocalDeclarationStatement);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipate this is going to cause all sorts of hell for completion. Not sure what to do about it, but it won't be fun.

@333fred
Copy link
Member

333fred commented Jun 8, 2021

        string source = "T[]? () => x : y";

Consider a version of this with a value in the indexer


In reply to: 857023605


In reply to: 857023605


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:922 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

@@ -6422,7 +6423,8 @@ private ScanTypeFlags ScanType(ParseTypeMode mode, out SyntaxToken lastTokenOfTy
{
case SyntaxKind.QuestionToken
when lastTokenOfType.Kind != SyntaxKind.QuestionToken && // don't allow `Type??`
lastTokenOfType.Kind != SyntaxKind.AsteriskToken: // don't allow `Type*?`
lastTokenOfType.Kind != SyntaxKind.AsteriskToken && // don't allow `Type*?`
!isFunctionPointer: // don't allow `delegate*<...>?`
Copy link
Member

@333fred 333fred Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// don't allow delegate*<...>?

Why not? It's unambiguous what it means. It's bad, true, but unambiguous. I think it would make error recovery better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of ScanType() is different today from ParseType() for this case, making it difficult to recover from cases such as delegate*<void>? () => x.

Why not? It's unambiguous what it means. It's bad, true, but unambiguous. I think it would make error recovery better.

Is delegate*<...>? different in that regard from Type?? or Type*? which we disallow here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is delegate*<...>? different in that regard from Type?? or Type*? which we disallow here?

I think so, yes. In either of those cases, Type is not guaranteed to definitely be a type. delegate* is 100% a type and cannot be anything else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of ScanType() is different today from ParseType() for this case, making it difficult to recover from cases such as delegate*? () => x.

I'm not opposed to making the behavior of ParseType be the same here, parsing into an invalid nullable type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with fred here. i'd far prefer that we parse these out as types (unless htat would be bad because it woudl actually break some level ternary-expression case). And then later on have binding/whatever error and say that a nullable func-ptr is not legal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@333fred
Copy link
Member

333fred commented Jun 8, 2021

}

Can you add some tests for switch expression arms? Particularly something like a switch { static T? () => {} => 1 }


In reply to: 857041650


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:3396 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

@333fred
Copy link
Member

333fred commented Jun 8, 2021

Done review pass (commit 3)


In reply to: 857043793

@cston
Copy link
Member Author

cston commented Jun 9, 2021

        verify();

Yes, language version will be checked in binding rather than parsing.


In reply to: 856977602


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:43 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

@cston
Copy link
Member Author

cston commented Jun 9, 2021

            N(SyntaxKind.AsyncKeyword);

This is a duplicate async modifier. (The error would be reported in binding rather than parsing.) An identifier would need to be escaped.


In reply to: 856977697


Refers to: src/Compilers/CSharp/Test/Syntax/Parsing/LambdaReturnTypeParsingTests.cs:2990 in 52dbe07. [](commit_id = 52dbe07, deletion_comment = False)

return this.ParseCastOrParenExpressionOrLambdaOrTuple(precedence);
if (IsPossibleLambdaExpression(precedence))
{
if (this.TryParseLambdaExpression() is { } lambda)
Copy link
Member

@jcouv jcouv Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe merge two if's? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge two if's?

Unfortunately, that would mean the lambda declaration is in scope for the entire switch which conflicts with an earlier declaration. The options were merge the two ifs and wrap the if in { } or nest the ifs. Neither option seemed better than the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally like the merging plus surrounding {}. but i agree it is all the same anyways. so no bggie.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to leave as-is then. Thanks for the explanation

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 5)

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 5), with a bug to follow up on the function pointer parsing.

{
Debug.Assert(this.CurrentToken.Kind == SyntaxKind.OpenParenToken);

var resetPoint = this.GetResetPoint();
try
{
if (ScanParenthesizedImplicitlyTypedLambda(precedence))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider having this in hte code, but maybe with a debug.assert that this is not the case and that the caller should have checked for this and parsed out the lambda itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still probably a reasonable idea. but #nonblocking.

{
return true;
}

continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrlate to this PR, but man do i dislike this scanner. the core thing it does which i think is broken is that it's just a loop that accepts certain forms in any order. in other words, you could have type, then attributes, then comma, then attributes, the type, then ref/out, then etc. It makes is nigh impossible to gauge if it's accepting things it shouldn't accept. this is a case where just parsing the real shape and checking it (and following tokesn) for acceptance is much more sensible.

@@ -12420,6 +12423,22 @@ LambdaExpressionSyntax parseLambdaExpressionWorker()
this.IsInAsync = true;
}
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is github just rendering this wrong. i see:

image

i legit don't know how this is even legal c# code. it looks like we parse the attribute... say we're in async... and then the method body ends?! i'm assuming github is just broken here maybe? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseLambdaExpression() contains several statements including a call to the parseLambdaExpressionWorker() local function where the actual changes are.

finally
{
this.Release(ref resetPoint);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this shoudl be a TryParseLambdaReturnType() helper.

Copy link
Member

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only checked the parser. I'm feeling pretty confident about it. Most of my feedback was around potential small bits of reorg, a couple of questions on us missing cases, and a few potential thoughts on how to make this simpler in the future.

I'm only 50% confident on the X ? Y() => z : w case though... but i can't come up with any cases that break this or that it goes the wrong path on... so let's go with this for now.

That said. can we throw a huge corpus against this and see if we generate a new tree in any cases of legal code before? that woudl raise my confidence level.

@cston
Copy link
Member Author

cston commented Jun 10, 2021

can we throw a huge corpus against this and see if we generate a new tree in any cases of legal code before?

Based on your suggestion, I've tested parsing with and without this change on dotnet/roslyn and dotnet/runtime, and for the C# files that parsed without errors previously, there were no changes to SyntaxKind or FullSpan in the resulting trees.

@cston cston merged commit 954ef84 into dotnet:features/lambdas Jun 10, 2021
@cston cston deleted the return-type branch June 10, 2021 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants