-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement re-lexing logic for better error recovery #11845
Conversation
3 | from x import (a, b, | ||
| ^ Syntax Error: Expected ')', found newline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, this is so much better. Nice!
c8a0997
to
55163c7
Compare
a43422d
to
9e56495
Compare
|
2294170
to
d550b3b
Compare
What happens if we have something like
In this case, the parser should recover from the unclosed |
Is the code snippet up-to date as you've mentioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this. It's so exciting to see how the hard work is paying off and we now get much better error messages.
I've left some open questions before approving but this is definetely looking good!
Do you know how our messages compare with CPython or Pyright? Do you see any significant difference?
/// and the caller is responsible for updating it's state accordingly. | ||
/// | ||
/// This method is a no-op if the lexer isn't in a parenthesized context. | ||
pub(crate) fn re_lex_logical_token(&mut self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your reasoning for returning a bool
over the re-lexed kind of the token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think initially I kept it as Option<TokenKind>
but the current usage doesn't really require the token itself. I don't think there's any other particular reason to use bool
apart from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was to return the re-lexed token kind or the current token kind if no re-lexing happens, so that it's just a TokenKind
(and in line with lex_token
). But if all we need is a bool, than that's fine.
/// previous current token. This also means that the current position of the lexer has changed | ||
/// and the caller is responsible for updating it's state accordingly. | ||
/// | ||
/// This method is a no-op if the lexer isn't in a parenthesized context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expand the comment with some explanation for which tokens this is relevant. Like which tokens may change in this situation or what kind of different tokens could be emitted.
@@ -108,19 +108,5 @@ Module( | |||
1 | class Foo[T1, *T2(a, b): | |||
| ^ Syntax Error: Expected ']', found '(' | |||
2 | pass | |||
3 | x = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty cool! Nice to see how all the work accumulated to now having a single, accurate error message.
|
||
|
||
| | ||
3 | def foo(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no more EOF parse errors!
2 | def foo(): | ||
| ^^^ Syntax Error: Expected an indented block after function definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
| | ||
28 | # The lexer is nested with multiple levels of parentheses | ||
29 | if call(foo, [a, b | ||
| ^ Syntax Error: Expected ']', found NonLogicalNewline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we map NonLogicalNewline
to newline? It seems an odd error message to show to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is odd. I could map it to "newline" in the Display
implementation.
Uhm, not sure what happened here (a, [b,
c
) |
Ok, so this is a bit problematic because when the lexer emitted the |
Ok, I've fixed this bug and expanded the documentation and inline comments. |
The CPython parser higlights two kinds of errors which is probably done by the lexer:
While Pyright gets really confused: |
958b587
to
90ca067
Compare
CodSpeed Performance ReportMerging #11845 will not alter performanceComparing Summary
|
2fdf062
to
14fcbb3
Compare
14fcbb3
to
1989946
Compare
## Summary This PR is a follow-up on #11845 to add the re-lexing logic for normal list parsing. A normal list parsing is basically parsing elements without any separator in between i.e., there can only be trivia tokens in between the two elements. Currently, this is only being used for parsing **assignment statement** and **f-string elements**. Assignment statements cannot be in a parenthesized context, but f-string can have curly braces so this PR is specifically for them. I don't think this is an ideal recovery but the problem is that both lexer and parser could add an error for f-strings. If the lexer adds an error it'll emit an `Unknown` token instead while the parser adds the error directly. I think we'd need to move all f-string errors to be emitted by the parser instead. This way the parser can correctly inform the lexer that it's out of an f-string and then the lexer can pop the current f-string context out of the stack. ## Test Plan Add test cases, update the snapshots, and run the fuzzer.
Summary
This PR implements the re-lexing logic in the parser.
This logic is only applied when recovering from an error during list parsing. The logic is as follows:
Newline
token instead ofNonLogicalNewline
.It turns out that the list parsing isn't that happy with the results so it requires some re-arranging such that the following two errors are raised correctly:
For (1), the following scenarios needs to be considered:
For (2), the following scenarios needs to be considered:
eat
call above. And, the parser doesn't take the re-lexing route on a comma token.resolves: #11640
Test Plan