Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST Improvement: Store dotted name parts #7760

Open
1 of 3 tasks
konstin opened this issue Oct 2, 2023 · 5 comments
Open
1 of 3 tasks

AST Improvement: Store dotted name parts #7760

konstin opened this issue Oct 2, 2023 · 5 comments
Labels
parser Related to the parser

Comments

@konstin
Copy link
Member

konstin commented Oct 2, 2023

Currently, the path of imports is not formatted, e.g.

import a . b

remains as-is. This is due to a bug in our AST:

/// A representation of an individual name imported via any import statement.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum AnyImport<'a> {
Import(Import<'a>),
ImportFrom(ImportFrom<'a>),
}
/// A representation of an individual name imported via an `import` statement.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Import<'a> {
pub name: Alias<'a>,
}
/// A representation of an individual name imported via a `from ... import` statement.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ImportFrom<'a> {
pub module: Option<&'a str>,
pub name: Alias<'a>,
pub level: Option<u32>,
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Alias<'a> {
pub name: &'a str,
pub as_name: Option<&'a str>,
}

The entire path is represented as a single string, even though it should be dot-separated identifier (the parser calls it DottedName, but then emits an identifier), especially since identifier can not contain dots.

  • Fix the AST so the import path is a Vec<Identifier> or something similar
  • Format import paths by removing whitespace (we can't insert parentheses like we do for dotted expressions)
  • Check that Identifier is only used for strings matching the rules
@konstin konstin added the formatter Related to the formatter label Oct 2, 2023
@charliermarsh charliermarsh added parser Related to the parser and removed formatter Related to the formatter labels Oct 2, 2023
@charliermarsh
Copy link
Member

I think the counterargument is that CPython uses this representation:

>>> print(ast.dump(ast.parse(s)))
Module(body=[Import(names=[alias(name='foo.bar')])], type_ignores=[])

And the ASDL uses the same identifier symbol as in other nodes: https://docs.python.org/3/library/ast.html.

But I care more about the ergonomics and performance than I do exact compatibility with CPython on these decisions. Would need to see what this makes easier or harder.

konstin added a commit that referenced this issue Oct 9, 2023
**Summary** Remove spaces from import statements such as

```python
import tqdm .  tqdm
from tqdm .    auto import tqdm
```

See also #7760 for a better solution.

**Test Plan** New fixtures
konstin added a commit that referenced this issue Oct 11, 2023
**Summary** Remove spaces from import statements such as

```python
import tqdm .  tqdm
from tqdm .    auto import tqdm
```

See also #7760 for a better solution.

**Test Plan** New fixtures
konstin added a commit that referenced this issue Oct 11, 2023
**Summary** Remove spaces from import statements such as 

```python
import tqdm .  tqdm
from tqdm .    auto import tqdm
```

See also #7760 for a better solution.

**Test Plan** New fixtures
@MichaReiser
Copy link
Member

@charliermarsh Is my understanding correct that CPython normalizes the identifier name (removes the whitespace)?

I was a bit surprised when I saw this representation first because it's somewhat uncommon (at least in the languages that I have used thus far)—especially considering that it re-joins the identifier tokens that the lexer identified.

Are there any upsides in the semantic model to have a single string? I would expect it to be easier to have the individual parts when e.g. resolving imports. The alternative is that we implement a components method similar to Rust's Path::components that returns the individual parts (splitting by string).

@MichaReiser MichaReiser added this to the Formatter: Stable milestone Oct 16, 2023
@charliermarsh
Copy link
Member

Yeah it's probably an improvement to store a list of dot-separated segments. There is likely no upside in the semantic model since we always decompose into segments.

@MichaReiser MichaReiser changed the title Fix the AST representation of import paths and format imports Format import paths Oct 27, 2023
@MichaReiser MichaReiser added formatter Related to the formatter and removed parser Related to the parser labels Oct 27, 2023
@MichaReiser
Copy link
Member

Let us fix this, regardless of how it gets implemented. Splitting the names in the formatter would be silly, but something we could do.

I think we're at least lucky that the following is not valid

import (a # comment
   .b)

@charliermarsh
Copy link
Member

I think this actually was fixed, we format it correctly: https://play.ruff.rs/86d1a181-4d27-4bbe-ad13-0c425e1976c0. I think this issue was about changing the AST to better reflect the real structure.

@MichaReiser MichaReiser changed the title Format import paths AST Improvement: Store dotted name parts Oct 27, 2023
@MichaReiser MichaReiser added formatter Related to the formatter parser Related to the parser and removed formatter Related to the formatter labels Oct 27, 2023
@MichaReiser MichaReiser removed this from the Formatter: Stable milestone Oct 27, 2023
@MichaReiser MichaReiser removed the formatter Related to the formatter label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

No branches or pull requests

3 participants