Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flake8-pyi] Implement PYI059 (generic-not-last-base-class) #11233

Merged
merged 18 commits into from
May 7, 2024

Conversation

tusharsadhwani
Copy link
Contributor

@tusharsadhwani tusharsadhwani commented May 1, 2024

Summary

Implements Y059 from flake8-pyi: typing.Generic[] should be the last base in a class' bases list.

If Multiple Generic[] classes are found, we don't generate a fix for it.

Test Plan

cargo test and cargo insta review.

@tusharsadhwani
Copy link
Contributor Author

Is it okay to add the autofix in the same PR?

@zanieb
Copy link
Member

zanieb commented May 1, 2024

Just wondering, what happens if a class has two Generic[] bases?

@zanieb
Copy link
Member

zanieb commented May 1, 2024

Is it okay to add the autofix in the same PR?

Yep!

@tusharsadhwani
Copy link
Contributor Author

what happens if a class has two Generic[] bases?

It's a runtime error:

>>> import typing
>>> T = typing.TypeVar('T')
>>> U = typing.TypeVar('U')
>>> class C(typing.Generic[T], typing.Generic[U]):
...   ...
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/typing.py", line 1102, in _generic_init_subclass
    raise TypeError(
TypeError: Cannot inherit from Generic[...] multiple times.

@zanieb
Copy link
Member

zanieb commented May 1, 2024

Good to know. We should make sure the rule doesn't behave poorly on that case regardless, i.e. it could cause an infinite fix loop.

Copy link
Contributor

github-actions bot commented May 1, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+2 -0 violations, +0 -0 fixes in 1 projects; 43 projects unchanged)

rotki/rotki (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ rotkehlchen/db/filtering.py:577:25: PYI059 [*] `Generic[]` should always be the last base class
+ rotkehlchen/types.py:977:34: PYI059 [*] `Generic[]` should always be the last base class

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PYI059 2 2 0 0 0

@tusharsadhwani
Copy link
Contributor Author

We should make sure the rule doesn't behave poorly on that case regardless

Should multiple Generic[] case not raise the issue at all then?

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you! Overall this looks good. Left some inline comments below :-)

@AlexWaygood
Copy link
Member

AlexWaygood commented May 3, 2024

Should multiple Generic[] case not raise the issue at all then?

I'd say probably not. The behaviour we implemented at flake8-pyi is to ignore this rule if Generic[] appeared multiple times in the bases tuple, and I think it makes sense to do the same thing here. If you have something like class Foo(int, Generic[T], Generic[S]): pass, I think it's honestly just confusing for a user if they get a message about Generic[] "not being the last base class". There's a much more significant issue in their code (you can't have more than one Generic[] in the bases tuple), and if they solve the more significant issue, then Generic[] not being the last base class would be naturally solved as a side effect of that.

(We shouldn't flag multiple Generic[]s in this check; that could possibly be a separate check.)

Comment on lines 122 to 139
fn generate_fix(bases: &Arguments, generic_base_index: usize, locator: &Locator) -> Fix {
let last_base = bases.args.last().expect("Last base should always exist");
let generic_base = bases
.args
.get(generic_base_index)
.expect("Generic base should always exist");
let next_base = bases
.args
.get(generic_base_index + 1)
.expect("Generic base should never be the last base during auto-fix");

let deletion = Edit::deletion(generic_base.start(), next_base.start());
let insertion = Edit::insertion(
format!(", {}", locator.slice(generic_base.range())),
last_base.end(),
);
Fix::safe_edits(insertion, [deletion])
}
Copy link
Member

@AlexWaygood AlexWaygood May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens for pathological cases like

class Foo(  # comment about the bracket
    # Part 1 of multiline comment 1
    # Part 2 of multiline comment 1
    Generic[T]  # comment about Generic[T]
    # another comment?
    ,  # comment about the comma?
    # part 1 of multiline comment 2
    # part 2 of multiline comment 2
    int,  # comment about int
    # yet another comment?
):  # and another one for good measure
    pass

Copy link
Contributor Author

@tusharsadhwani tusharsadhwani May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will technically work, but comments before int and after Generic[T] get eaten up. That is fixable, but the fix is subjective.

What's ruff's general stance in such cases? I know libcst has capabilities to try and identify which comment groups belong to which expression. But the easier thing would be to avoid moving comments at all, and just preserve comments.

Copy link
Member

@AlexWaygood AlexWaygood May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different rules take different stances; we don't have a unified policy yet. Several of our autofixes will currently happily delete your comments (e.g. #9779); I think we all agree that this is bad, but we haven't yet decided how bad it is (see #9790). Other rules approach comments with a perhaps-excessive level of caution, either using the raw token stream for their autofix (RUF022, RUF023), or using libcst. Both are much slower than simple text replacements, I think libcst especially.

the fix is subjective.

absolutely! But I think that's always the case for something like this. I don't mind that much if some comments end up in the "wrong place", but I do think we should try to avoid deleting any comments if it's not too hard or costly to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to figure out how to get the start and end ranges of the comma after the Generic base. Tried using libcst to obtain the Comma itself, but does it have position information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just read one character at a time after the end of the generic base until you see a comma — does that make sense here? I'd avoid LibCST unless it's absolutely necessary (it's slow).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just read one character at a time after the end of the generic base until you see a comma — does that make sense here? I'd avoid LibCST unless it's absolutely necessary (it's slow).

Agree -- memchr is probably the best tool for that (you can grep in the crates/ruff_linter directory for existing uses). Other possibilities are using libCST (but as Zanie says, it's really slow), or using the raw tokens (also can be slow, and probably more complicated here than using memchr).

If you look at https://play.ruff.rs/b2b834e8-0247-47fe-995a-c0ade0e6b240, with the following snippet:

class Foo(int, str): pass

On the AST tab on the right, we can see:

  • The ExprName node for the int base has range 10..13
  • The ExprName node for the str base has range 15..18

On the tokens tab on the right, we can see that:

  • The comma in between the two nodes has range 13..14

Copy link
Contributor Author

@tusharsadhwani tusharsadhwani May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First try at this, I've used .find() (which I suppose can be swapped with memchr) and .position() (which I'm not sure if it can be replaced)

Comment on lines 113 to 117
let last_whitespace = (comma_after_generic_base + 1)
+ locator.contents()[comma_after_generic_base + 1..]
.bytes()
.position(|b| !b.is_ascii_whitespace())
.expect("Non whitespace character must always exist after Generic[]");
Copy link
Member

@AlexWaygood AlexWaygood May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what's going on here, I'm afraid :(

Is this the best name for this variable? Would you mind adding some comments to this function to explain how it works (preferably with some examples)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to find the first index in the source code after Generic[...],, which doesn't contain a whitespace.

Basically, deleting just Generic[T] and the comma after it would leave a trailing whitespace around (usually a space).

Without the whitespace detection, this code:

class Foo(Generic[T], Bar):

would be autofixed into:

class Foo( Bar, Generic[T]):

i.e. The space before Bar was not deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure if this is the best way to write this, please feel free to refactor or comment on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think deleting a positional argument from a function call during autofix is common enough that once we finalize on the way it should be done this can be lifted to make a helper instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... in fact...

/// Generic function to remove arguments or keyword arguments in function
/// calls and class definitions. (For classes `args` should be considered
/// `bases`)
///
/// Supports the removal of parentheses when this is the only (kw)arg left.
/// For this behavior, set `remove_parentheses` to `true`.
pub(crate) fn remove_argument<T: Ranged>(
argument: &T,
arguments: &Arguments,
parentheses: Parentheses,
source: &str,
) -> Result<Edit> {
// Partition into arguments before and after the argument to remove.
let (before, after): (Vec<_>, Vec<_>) = arguments
.arguments_source_order()
.map(|arg| arg.range())
.filter(|range| argument.range() != *range)
.partition(|range| range.start() < argument.start());
if !after.is_empty() {
// Case 1: argument or keyword is _not_ the last node, so delete from the start of the
// argument to the end of the subsequent comma.
let mut tokenizer = SimpleTokenizer::starts_at(argument.end(), source);
// Find the trailing comma.
tokenizer
.find(|token| token.kind == SimpleTokenKind::Comma)
.context("Unable to find trailing comma")?;
// Find the next non-whitespace token.
let next = tokenizer
.find(|token| {
token.kind != SimpleTokenKind::Whitespace && token.kind != SimpleTokenKind::Newline
})
.context("Unable to find next token")?;
Ok(Edit::deletion(argument.start(), next.start()))
} else if let Some(previous) = before.iter().map(Ranged::end).max() {
// Case 2: argument or keyword is the last node, so delete from the start of the
// previous comma to the end of the argument.
let mut tokenizer = SimpleTokenizer::starts_at(previous, source);
// Find the trailing comma.
let comma = tokenizer
.find(|token| token.kind == SimpleTokenKind::Comma)
.context("Unable to find trailing comma")?;
Ok(Edit::deletion(comma.start(), argument.end()))
} else {
// Case 3: argument or keyword is the only node, so delete the arguments (but preserve
// parentheses, if needed).
Ok(match parentheses {
Parentheses::Remove => Edit::range_deletion(arguments.range()),
Parentheses::Preserve => Edit::range_replacement("()".to_string(), arguments.range()),
})
}
}
/// Generic function to add arguments or keyword arguments to function calls.
pub(crate) fn add_argument(
argument: &str,
arguments: &Arguments,
comment_ranges: &CommentRanges,
source: &str,
) -> Edit {
if let Some(last) = arguments.arguments_source_order().last() {
// Case 1: existing arguments, so append after the last argument.
let last = parenthesized_range(
match last {
ArgOrKeyword::Arg(arg) => arg.into(),
ArgOrKeyword::Keyword(keyword) => (&keyword.value).into(),
},
arguments.into(),
comment_ranges,
source,
)
.unwrap_or(last.range());
Edit::insertion(format!(", {argument}"), last.end())
} else {
// Case 2: no arguments. Add argument, without any trailing comma.
Edit::insertion(argument.to_string(), arguments.start() + TextSize::from(1))
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sorry, I completely forgot that we had those helpers!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't find them, so partly on me :P it uses the Tokenizer which is pretty clever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and refactor it.

@AlexWaygood
Copy link
Member

(Everything looks great to me now except the autofix code, which I'm still not quite sure about -- both in terms of code readability and in terms of the number of .expect() and .unwrap() calls in it :-)

@AlexWaygood
Copy link
Member

@tusharsadhwani, I've just pushed 6fce0fd, which gets rid of the remaining .expect() and .unwrap() calls from generate_fix(). Instead, the errors are bubbled up by returning an anyhow::Result<Fix> from generate_fix(). In combination with the Diagnostic::try_set_fix() method, that means that, rather than panicking, we'll gracefully log the error and move on if we fail to construct a Fix. That's usually preferable for this kind of thing, since it's not 100% essential that a Fix is constructed, but panicking would be pretty bad.

@tusharsadhwani
Copy link
Contributor Author

Okay, this is much cleaner.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :D

@AlexWaygood AlexWaygood changed the title [flake8-pyi] Implement PYI059 [flake8-pyi] Implement PYI059 (generic-not-last-base-class) May 7, 2024
@AlexWaygood AlexWaygood enabled auto-merge (squash) May 7, 2024 10:02
@AlexWaygood AlexWaygood merged commit bc3f4fa into astral-sh:main May 7, 2024
19 checks passed
@tusharsadhwani tusharsadhwani deleted the pyi-059 branch May 7, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants