Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Initial tests for instance attributes #15474

Merged
merged 18 commits into from
Jan 15, 2025
Merged

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Jan 14, 2025

Summary

Adds some initial tests for class and instance attributes, mostly to document (and discuss) what we want to support eventually. These tests are not exhaustive yet. The idea is to specify the coarse-grained behavior first.

Things that we'll eventually want to test:

  • Interplay with inheritance
  • Support Final in addition to ClassVar
  • Specific tests for ClassVar, like making sure that we support things like x: Annotated[ClassVar[int], "metadata"]
  • … or making sure that we raise an error here:
    class Foo:
        def __init__(self):
            self.x: ClassVar[str] = "x"
  • Add tests for __new__ in addition to the tests for __init__
  • Add tests that show that we use the union of types if multiple methods define the symbol with different types
  • Make sure that diagnostics are raised if, e.g., the inferred type of an assignment within a method does not match the declared type in the class body.
  • [red-knot] Initial tests for instance attributes #15474 (comment)
  • Method calls are completely left out for now.
  • Same for @property
  • … and the descriptor protocol

Test Plan

New Markdown tests

@sharkdp sharkdp added the red-knot Multi-file analysis & type inference label Jan 14, 2025

This comment was marked as off-topic.

@sharkdp sharkdp added the testing Related to testing Ruff itself label Jan 15, 2025
@sharkdp sharkdp force-pushed the david/instance-attributes branch from 3702bf9 to 462c6ab Compare January 15, 2025 12:21
@sharkdp sharkdp marked this pull request as ready for review January 15, 2025 12:21
@sharkdp sharkdp force-pushed the david/instance-attributes branch from 462c6ab to bcb7937 Compare January 15, 2025 12:23
@sharkdp sharkdp force-pushed the david/instance-attributes branch 3 times, most recently from 85cc979 to 92e5f00 Compare January 15, 2025 12:31
@sharkdp sharkdp force-pushed the david/instance-attributes branch from 92e5f00 to a7cd7e2 Compare January 15, 2025 12:34

#### Variable declared in class body and defined in unrelated method

We also recognize pure instance variables if they are defined in a method that is not `__init__`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we shouldn't but this will require traversing the entire class body. This could be somewhat expensive (at least it's more expensive than only doing so for __init__).

  • Does that include accesses inside of nested classes, functions or lambdas?
  • Does this include accesses where the variable isn't self
class C:
	pure_instance_variable: str
	
	def test(self, other: Self, value: str):
		other.pure_instance_variable = value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy and pyright both support this, so I think we have to, expensive though it may be. Users will expect their type checker to support this and will complain if it doesn't. It's also unfortunately reasonably common in Python to do things like

class Foo:
    def __init__(self):
        self.initialize_x_variable()

    def initialize_x_variable(self):
        # many lines of complicated logic
        self.x = 42
  • Does that include accesses inside of nested classes, functions or lambdas?

  • Does this include accesses where the variable isn't self

I think we probably don't need to support these, certainly not initially

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also experiment with varying degrees of laziness here. For first-party code that we're actually checking, I think we'll probably need to know all instance attributes of every class. But for third-party/stdlib code that our first-party code is interacting with, it often probably isn't necessary to know what instance attributes the class has and, even if it is, it might not be necessary to materialize the full set of instance attributes the class has. For something like the following, we could plausibly short-circuit if we find the foo attribute defined in __init__ or __new__, and not bother analyzing the other methods the SomeClass defines:

from third_party import SomeClass

x = SomeClass()
print(x.foo + 5)  # all we need to know here is whether `SomeClass` has a `foo` attribute
                  # and what its type is

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a full review yet, but gotta go for lunch!


#### Variable declared in class body and defined in unrelated method

We also recognize pure instance variables if they are defined in a method that is not `__init__`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy and pyright both support this, so I think we have to, expensive though it may be. Users will expect their type checker to support this and will complain if it doesn't. It's also unfortunately reasonably common in Python to do things like

class Foo:
    def __init__(self):
        self.initialize_x_variable()

    def initialize_x_variable(self):
        # many lines of complicated logic
        self.x = 42
  • Does that include accesses inside of nested classes, functions or lambdas?

  • Does this include accesses where the variable isn't self

I think we probably don't need to support these, certainly not initially

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is fantastic! Good call on leaving out Final for now.

Two notable things that you haven't covered are that we need to infer an instance variable for any declaration in the class body, even if there is no binding for the variable in any method at all. We could have a disabled-by-default rule that warns if you try to access the variable when it's not bound in any method, but pyright's experience with such a rule has been very mixed; it's hard to get it right without lots of false positives:

class Foo:
    x: int

and we need to infer an instance variable for a variable that is defined in a non-__init__ variable but not declared in a class body:

class Foo:
    def __init__(self):
        self.initialize_x()

    def initialize_x(self):
        self.x = 42  # TODO: should the attribute have `Unknown`, `int`, `Literal[42]` or `Unknown | Literal[42]`
                     # from the perspective of code from other scopes?

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- though I'd still personally add the cases I mentioned in #15474 (review) prior to landing

@AlexWaygood
Copy link
Member

Depending on how exhaustive you're looking to be for ClassVar tests here, you could also consider adding this test, on which we should emit a diagnostic:

class Foo:
    def __init__(self):
        self.x: ClassVar[str] = "x"

@sharkdp
Copy link
Contributor Author

sharkdp commented Jan 15, 2025

though I'd still personally add the cases I mentioned in #15474 (review) prior to landing

I was planning to, you are too fast Alex 😄. I think those should be covered now. I actually changed a previous test to cover the "not declared in body, only bound in non-__init__ method" case (by removing the declaration from the body). This should only make the test more general, I think. But again, I am planning to increase the overall test coverage eventually. This is mostly to get the initial design right. Thank you for the detailed review and your help!

@sharkdp
Copy link
Contributor Author

sharkdp commented Jan 15, 2025

Depending on how exhaustive you're looking to be for ClassVar tests here, you could also consider adding this test

I'll add it to the list in the PR description.

@sharkdp sharkdp enabled auto-merge (squash) January 15, 2025 14:41
@sharkdp sharkdp merged commit 8712438 into main Jan 15, 2025
20 checks passed
@sharkdp sharkdp deleted the david/instance-attributes branch January 15, 2025 14:43
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


c_instance = C(1)

# TODO: should be `Literal["value set in __init__"]` (or `str` which would probably be more generally useful)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question. Both pyright and mypy have heuristics to widen literal types when they think it's "probably more useful." This seems a bit unprincipled to me, but I'm open to the possibility that we'll also need to do it. I guess my question is, if the only assignment to self.pure_instance_variable1 that occurs anywhere in the class is self.pure_instance_variable1 = "value set in __init__", then why wouldn't you want it typed as Literal["value set in __init__"]? Just to accommodate external code setting it to some other value? That seems like an edge case, and not too much to ask you to add an explicit : str if that's what you wanted.

Copy link
Member

@AlexWaygood AlexWaygood Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the only assignment to self.pure_instance_variable1 that occurs anywhere in the class is self.pure_instance_variable1 = "value set in __init__", then why wouldn't you want it typed as Literal["value set in __init__"]? Just to accommodate external code setting it to some other value?

Subclasses also won't be able to assign a type to the attribute unless the type is assignable to the Literal[...] type

That seems like an edge case

Hmm, I don't agree. I think it's quite common for code to only assign an instance variable in the __init__ method of the class, but with the intention that it should be possible to reassign the attribute from other scopes.

and not too much to ask you to add an explicit : str if that's what you wanted.

Surely that violates the gradual guarantee? And this doesn't seem to tackle at all the problem of typed code interacting with untyped third-party code, which may be unwilling to add type annotations in a timely manner?

I'm leaning towards either Unknown | Literal[...] or Unknown | str for cases like these. Haven't made my mind up about which is preferable there (haven't thought too deeply about it yet!)

Copy link
Contributor

@carljm carljm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely that violates the gradual guarantee

Yes, but so does widening the literal type to str. That's what I mean by "unprincipled" -- it doesn't fundamentally change anything, just arbitrarily chooses some wider type because we think it's "probably what you meant."

I like the idea of a union with Unknown for inferred attribute types! I think that is the principled gradual-guarantee approach here. It will be more forgiving than what people are used to, but I generally like it if we take the gradual guarantee more seriously than existing type checkers: if you want more strictness, annotate.

(I don't see why we would special-case converting it to Unknown | str instead of Unknown | Literal[...] if we're unioning with Unknown -- I think the union already takes care of the problematic aspects of the literal inference, by allowing you to assign anything.)

# mypy and pyright do not show an error here.
reveal_type(c_instance.pure_instance_variable5) # revealed: @Todo(instance attributes)

c_instance.pure_instance_variable1 = "value set on instance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how we answer the above question, arguably this should be an error

Comment on lines +66 to +68
# TODO: should ideally be `Literal["value set on instance"]`
# (due to earlier assignment of the attribute from the global scope)
reveal_type(c_instance.pure_instance_variable1) # revealed: @Todo(instance attributes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some question about the extent to which we'll want to do this narrowing. It's unsound in general (because we can't really say with any certainty what happens to c_instance between that assignment and this check), and particularly likely to be unsound in global scope. But we'll probably need to do some of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had this conversation several times now 😆 I still favour the more pragmatic behaviour of mypy and pyright over the stricter behaviour of pyre here. But yeah, we can debate the details at a later stage

Comment on lines +111 to +113

# for a more realistic example, let's actually call the method
c_instance.set_instance_variable()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the even more realistic example would have this method called from __init__, but if we do that in the test it raises more questions about whether we're trying to detect that call...

sharkdp added a commit that referenced this pull request Jan 15, 2025
## Summary

Some clarifications in the instance-attributes tests, mostly regarding
type inference behavior following this discussion:

#15474 (comment)
@sharkdp sharkdp mentioned this pull request Jan 27, 2025
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference testing Related to testing Ruff itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants