-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace LALRPOP parser with hand-written parser #10036
Conversation
3209b85
to
d247242
Compare
d247242
to
a695672
Compare
CodSpeed Performance ReportMerging #10036 will improve performances by ×2.4Comparing Summary
Benchmarks breakdown
|
96addbf
to
b5cc384
Compare
b07c008
to
70731cb
Compare
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
E999 | 2 | 1 | 1 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+1 -1 violations, +0 -0 fixes in 1 projects; 43 projects unchanged)
demisto/content (+1 -1 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview
+ Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py:106:12: E999 SyntaxError: Multiple exception types must be parenthesized - Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py:106:21: E999 SyntaxError: Unexpected token ','
Changes by rule (1 rules affected)
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
E999 | 2 | 1 | 1 | 0 | 0 |
Formatter (stable)
✅ ecosystem check detected no format changes.
Formatter (preview)
✅ ecosystem check detected no format changes.
cf89468
to
b803d4f
Compare
81e8050
to
4f5604f
Compare
236aa4a
to
c16f7b7
Compare
48ce841
to
6d26ec6
Compare
## Summary This PR reviews the logic for IPython escape command parsing and makes the required changes to almost match the old parsing logic. ~One change I've not incorporated is the `unparse_expr`. The new parser can use `src_text` directly but this means that any whitespace is preserved~ Migrated the `unparse_expr` logic ## Test plan Use the new parser for testing escape commands and verify that there are no snapshot changes.
## Summary This PR updates the `ruff_new_parser_equiv` fuzzer target to reject the corpus which produces an empty range for the AST from LALRPOP. If there are no nodes, then LALRPOP produces an empty range while the new parser uses the correct range of the source itself.
## Summary This PR updates the LALRPOP to use the complete source range for module by looking at the tokens before the parsing begins. This means the module range now corresponds to the actual source even if it only contains trivia tokens. This is mainly so that the fuzzer doesn't crash because the AST doesn't match.
> `simple` is a boolean integer set to `True` for a [Name](https://docs.python.org/3/library/ast.html#ast.Name) node in `target` that do not appear in between parenthesis and are hence pure names and not expressions. Reference: https://docs.python.org/3/library/ast.html#ast.AnnAssign
## Summary Dotted names aren't allowed in `from ... import` statement. They're only allowed in `import` statement. ### Alternative Another solution would be to parse the dotted name, check if there's a `.` in the parsed string and raise an error. I choose not to do this because it didn't make sense to do `contains` for every import name. ## Test Plan Add invalid syntax test cases to verify the logic.
## Summary This PR fixes a bug in parenthesized with-items parsing where the parser should expect a comma but it didn't. This is the case in the speculative parsing loop of parenthesized with-items. Once the parser finds any token other than comma, it breaks out of the loop. Then, once the parser has determined the with item kind, it needs to expect a comma if it's not the end of parenthesized with-items. ## Test Plan Add inline test cases, verify the snapshots.
## Summary This PR avoids the panic when trying to recover from an invalid pattern. The panic occurs because there's no way to represent `x as y` syntax as a Python expression. There were few cases which were not accounted for when working on this. The cases where it would panic are the ones where only a subset of pattern is valid in a position but all patterns are valid in a nested position of the mentioned top-level pattern. For example, consider a mapping pattern and the following points: * Key position can only be a `MatchValue` and `MatchSingleton` * `MatchClass` can contain a `MatchAs` as an argument * `MatchAs` pattern with both pattern and name (`a as b`) cannot be converted to a valid expression Considering this, if a mapping pattern has a class pattern with an `as` pattern as one of it's argument, the parser would panic. ```py match subject: case {Foo(a as b): 1}: ... # ^^^^^^^^^^^^^^^^ mapping pattern # ^^^^^^^^^^^ mapping pattern key, class pattern # ^^^^^^ as pattern with both pattern and name ``` The same case is for complex literal patterns as well. I'm not sure what the recovery should look like but for now it's better to avoid the panic and create an invalid expression node in it's place. ## Test Plan Add new test cases and update the snapshots.
## Summary This fixes a bug in the parser when `Mode::Expression` is used where the parser didn't recover from unexpected token after the expression. The current solution isn't ideal as the parser will drop all the remaining tokens after the initial expression has been parsed. This is to avoid panicking especially in the context of forward annotation like: ```python x: Literal["foo", "bar baz"] ``` Here, the parser is being given the `bar baz` as the source to be parsed using `Mode::Expression`. The parser will parse `bar` as a name expression and then stop because the expression ends there. But, it seems like there's more tokens remaining. This specific case could potentially be recovered by using a `ExprTuple` and raising a "missing comma" error but I've avoided that complexity for now. ## Test Plan Add new test cases for `Mode::Expression` and verified the snapshots.
## Summary This PR fixes the bug where the parser would fail to parse the following code: ```python for (x in y)[0] in z: ... ``` This is because when parsing the target of a `for` statement, the context flag for the parser is set to `FOR_TARGET` and this is used by binary expression parsing to stop the expression parsing when it sees the `in` keyword. But, in parenthesized context, this is allowed. The solution implemented in this PR is to reset the parser context when parsing a parenthesized expression and set it back to the original context later. An alternative solution would be to introduce [`ExpressionContext`](https://github.com/biomejs/biome/blob/838ccb442370e72197e625a7d0e4456e6e77e498/crates/biome_js_parser/src/syntax/expr.rs#L42-L43) but that would require a lot of updates as almost all of expression parsing function would need to be updated to take this context. A benefit of this would be that both `AllowStarredExpression` and `AllowNamedExpression` can be merged in the context and everything around `ParserCtxFlags` can be removed. It could possibly also simplify certain parsing functions. ## Test Plan Add test cases and verified the snapshots.
## Summary This PR removes a test case from the formatter which uses a named expression as `del` target. This is an invalid syntax in the new parser.
## Summary This fixes a bug for code like: ```python match subject: case a, b if expr: ... ``` Previously, the `:` was the only terminator token so the list parsing wouldn't stop after `b`. The `if` token has been added to the terminator list to fix this. ## Test Plan Add test cases and verified the snapshots.
## Summary There have been some grammar changes in [PEP 646] which were not accounted for in the old parser. The new parser has been updated with the correct AST. This is the case when there's a starred expression inside a subscript expression like the following example: ```python data[*x] ``` This gives us the AST where the slice element is actually a tuple expression with one element (starred expression) instead of just a starred expression. Now, the formatter's current behavior is to always add a trailing comma in a tuple with a single element. This PR updates the formatter to use the "preserve" behavior in trailing comma as well. So, trailing comma will not be added in the above example and if there's a trailing comma in the above example, it'll be preserved. This retains the current behavior without the AST change. [PEP 646]: https://peps.python.org/pep-0646/#change-1-star-expressions-in-indexes ## Test Plan Run `cargo insta test -p ruff_python_formatter` and make sure there are no changes.
e7b19dd
to
924ed5e
Compare
One obligatory comment :P Excellent work. Looking forward to my next git pull |
🚀 Let me know if I can help with further testing. |
(Supersedes #9152, authored by @LaBatata101)
Summary
This PR replaces the current parser generated from LALRPOP to a hand-written recursive descent parser.
It also updates the grammar for PEP 646 so that the parser outputs the correct AST. For example, in
data[*x]
, the index expression is now a tuple with a single starred expression instead of just a starred expression.Beyond the performance improvements, the parser is also error resilient and can provide better error messages. The behavior as seen by any downstream tools isn't changed. That is, the linter and formatter can still assume that the parser will stop at the first syntax error. This will be updated in the following months.
For more details about the change here, refer to the PR corresponding to the individual commits and the release blog post.
Test Plan
Write lots and lots of tests for both valid and invalid syntax and verify the output.
Acknowledgements