Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass ParserOptions to the parser #16220

Merged
merged 16 commits into from
Feb 19, 2025
Merged

Pass ParserOptions to the parser #16220

merged 16 commits into from
Feb 19, 2025

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Feb 17, 2025

Summary

This is part of the preparation for detecting syntax errors in the parser from #16090. As suggested in this comment, I started working on a ParseOptions struct that could be stored in the parser. For this initial refactor, I only made it hold the existing Mode option, but for syntax errors, we will also need it to have a PythonVersion. For that use case, I'm picturing something like a ParseOptions::with_python_version method, so you can extend the current calls to something like

ParseOptions::from(mode).with_python_version(settings.target_version)

But I thought it was worth adding ParseOptions alone without changing any other behavior first.

Most of the diff is just updating call sites taking Mode to take ParseOptions::from(Mode) or those taking PySourceTypes to take ParseOptions::from(PySourceType). The interesting changes are in the new parser/options.rs file and smaller parts of parser/mod.rs and ruff_python_parser/src/lib.rs.

Outdated implementation details for future reference

NOTE: as the details summary says, this does not correspond to the current implementation, which preserves the original signature of parse_unchecked_source and uses a much simpler representation of ParseOptions. This is preserved just for historical purposes.

The ParserOptions implementation is complicated a bit by wanting to preserve the SAFETY comment in parse_unchecked_source:

pub fn parse_unchecked_source(source: &str, source_type: PySourceType) -> Parsed<ModModule> {
// SAFETY: Safe because `PySourceType` always parses to a `ModModule`
Parser::new(source, source_type.as_mode())
.parse()
.try_into_module()
.unwrap()
}

The only way I could see to enforce this while passing a ParserOptions instead of a PySourceType was by adding a generic type state to ParserOptions, so that parse_unchecked_source would take a ParserOptions<KnownSource> constructed by ParserOptions::from_source_type, as opposed to the ParserOptions<UnknownSource> produced by ParserOptions::from_mode.

On top of that, I didn't want to propagate the generics from ParserOptions to all uses of Parser, so there's another indirection through the AsParserOptions trait, which is used by Parser to hold a Box<dyn AsParserOptions> instead of a ParserOptions<S: SourceType>.

Finally, other public functions in the parser crate could be updated to take ParserOptions if we want, but parse_unchecked_source is the variant that ruff and red-knot call where I want to integrate the syntax error checks, so it was my main priority.

Test Plan

Existing tests, this should not change any behavior.

@ntBre ntBre added the internal An internal refactor or improvement label Feb 17, 2025
Copy link
Contributor

github-actions bot commented Feb 17, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dhruvmanila
Copy link
Member

The only way I could see to enforce this while passing a ParserOptions instead of a PySourceType was by adding a generic type state to ParserOptions, so that parse_unchecked_source would take a ParserOptions<KnownSource> constructed by ParserOptions::from_source_type, as opposed to the ParserOptions<UnknownSource> produced by ParserOptions::from_mode.

What's the motivation for changing the function signature for parse_unchecked_source? I think we should keep it as is and accept PySourceType which is then internally converted into a ParserOptions. This way we can maintain the invariant that PySourceType always parses into a ModModule. This will then avoid the need for KnownSource.


I've some additional thoughts that are related to this:

  • I'm wondering if we should abstract away the Mode parameter i.e., replace mode: Mode with options: ParserOptions in parse and parse_unchecked functions
  • We could possibly remove Mode::Ipython and use PySourceType::Ipynb instead because the ParserOptions will contain it and it seems a bit redundant that both Mode and PySourceType, which the parser and lexer have access to, contains an indicator that we're in a notebook context.
  • I think we should also pass the options to the Lexer. I don't think the TokenSource needs any info from it but even if it did, it can just ask the lexer.

Curious to hear @MichaReiser thoughts on this.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dhruvmanila that I wouldn't make ParserOptions generic or add ParserOptions to parse_unchecked_source. It unnecessarily complicates things internally.

We could consider adding a ParseSourceOptions struct which is ParserOptions with the source type prop omitted, but I'm not sure if it's worth exposing the options on parse_unchecked_source. Instead, advanced use cases should use pare_unchecked and then map/unwrap on the call site.

I think it could make sense to expose the ParserOptions to the Lexer, or at least a subset of the options but I think this is something we can tackle separately.

@AlexWaygood
Copy link
Member

I'll leave this one to Micha/Dhruv :-)

@AlexWaygood AlexWaygood removed their request for review February 18, 2025 11:40
@ntBre
Copy link
Contributor Author

ntBre commented Feb 18, 2025

The motivation for changing the signature of parse_unchecked_source in particular is that it's called by ruff and by red-knot where I wanted to inject the Python version for syntax error detection. I wanted to avoid moving the unwrap call and accompanying SAFETY comments to those call sites, but that could certainly be preferable to the complication of the generics here.

And the Box<dyn AsParserOptions> follows from that to erase the generic on ParserOptions to avoid making every Parser a Parser<T>. That would obviously go away if ParserOptions were not generic.

ntBre added 11 commits February 18, 2025 15:15
The motivation for this is that `parse_unchecked_source` relies on the fact that
it's called with a `PySourceType` to know that it's safe to unwrap
`try_into_module`. If we change the signature from

```rust
pub fn parse_unchecked_source(source: &str, source_type: PySourceType) -> Parsed<ModModule> {
```

directly to

```rust
pub fn parse_unchecked_source(source: &str, options: ParserOptions) -> Parsed<ModModule> {
```

we will lose this safety. Instead, to enforce this in the type system, I want to
have to call `parse_unchecked_source` with a `ParserOptions<KnownSource>`
and *not* with a `ParserOptions<UnknownSource>`.

This is slightly more awkward because I also don't want to propagate the generic
on `ParserOptions` to every `Parser` site, so I introduced the separate
`AsParserOptions` trait so `Parser` could hold a `Box<dyn AsParserOptions>`
instead of needing `SourceType` generics everywhere.
@ntBre ntBre force-pushed the brent/pyfilesource branch from f9977cd to f486e76 Compare February 18, 2025 20:23
@ntBre
Copy link
Contributor Author

ntBre commented Feb 18, 2025

I've reverted the generic stuff and put source_type back in parse_unchecked_source. I can just change ruff and red-knot to call and unwrap parse_unchecked when I actually integrate the syntax error checks.

I also changed parse and parse_unchecked to take ParserOptions instead of Mode, as suggested. That will work nicely with the point above.

We could possibly remove Mode::Ipython and use PySourceType::Ipynb instead because the ParserOptions will contain it and it seems a bit redundant that both Mode and PySourceType, which the parser and
lexer have access to, contains an indicator that we're in a notebook context.

I looked into this briefly, but I don't think the lexer currently has access to a PySourceType, so this might make more sense to combine with passing other options to the lexer too.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. I recommend giving @dhruvmanila some time to review the PR too.

Comment on lines 532 to 533
let parsed = parse(source, ParserOptions::from_mode(source_type.as_mode()))
.expect("Expect source to be valid Python");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could implement from_source_type (or From<SourceType> for ParserOptions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I was thinking about that too. Should I implement From<Mode> for ParseOptions too? I guess I was thinking we might have multiple from_* methods, but I'm realizing now that they won't actually conflict with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me more to put in the separate options.rs file anyway. I thought it looked a bit sparse before 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a love-hate relationship with From and I've been very inconsistent in the past with having explicit from_ methods vs From implementations...

I do like that from_mode is explicit but it also is a bit verbose... 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean. I already switched to From immediately after commenting, but I'm happy to go back to the explicit versions if you and/or Dhruv prefer. I think I tend toward from_ methods usually, so I can really go either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to you and @dhruvmanila

Copy link
Member

@dhruvmanila dhruvmanila Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think From is fine. In the future, when there are more options and if it turns out a bit difficult to read, we could switch to from_ methods. Another option would be to use a builder pattern with the default impl so that it reads as ParseOptions::default().with_mode(mode).with_python_version(version).

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'd recommend updating the PR description to reflect the current state of the PR for future readers. We can keep the "Details" content as a collapsed section for historical purposes.

@dhruvmanila dhruvmanila added the parser Related to the parser label Feb 19, 2025
@ntBre ntBre merged commit 97d0659 into main Feb 19, 2025
21 checks passed
@ntBre ntBre deleted the brent/pyfilesource branch February 19, 2025 15:50
@ntBre
Copy link
Contributor Author

ntBre commented Feb 19, 2025

I forgot to update the ParserOptions in the title 🤦 but I did update the description!

ntBre added a commit that referenced this pull request Feb 26, 2025
## Summary

This PR builds on the changes in #16220 to pass a target Python version
to the parser. It also adds the `Parser::unsupported_syntax_errors` field, which
collects version-related syntax errors while parsing. These syntax
errors are then turned into `Message`s in ruff (in preview mode).

This PR only detects one syntax error (`match` statement before Python
3.10), but it has been pretty quick to extend to several other simple
errors (see #16308 for example).

## Test Plan

The current tests are CLI tests in the linter crate, but these could be
supplemented with inline parser tests after #16357.

I also tested the display of these syntax errors in VS Code:


![image](https://github.com/user-attachments/assets/062b4441-740e-46c3-887c-a954049ef26e)

![image](https://github.com/user-attachments/assets/101f55b8-146c-4d59-b6b0-922f19bcd0fa)

---------

Co-authored-by: Alex Waygood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants