Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Initial tests for instance attributes #15474

Merged
merged 18 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,11 @@ Several type qualifiers are unsupported by red-knot currently. However, we also
false-positive errors if you use one in an annotation:

```py
from typing_extensions import Final, ClassVar, Required, NotRequired, ReadOnly, TypedDict
sharkdp marked this conversation as resolved.
Show resolved Hide resolved
from typing_extensions import Final, Required, NotRequired, ReadOnly, TypedDict

X: Final = 42
Y: Final[int] = 42

class Foo:
A: ClassVar[int] = 42

# TODO: `TypedDict` is actually valid as a base
# error: [invalid-base]
class Bar(TypedDict):
Expand Down
285 changes: 278 additions & 7 deletions crates/red_knot_python_semantic/resources/mdtest/attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,264 @@

Tests for attribute access on various kinds of types.

## Class and instance variables

### Pure instance variables

#### Variable only declared/bound in `__init__`

Variables only declared and/or bound in `__init__` are pure instance variables. They cannot be
accessed on the class itself.

```py
class C:
def __init__(self, value2: int, flag: bool = False) -> None:
# bound but not declared
self.pure_instance_variable1 = "value set in __init__"

# bound but not declared - with type inferred from parameter
self.pure_instance_variable2 = value2

# declared but not bound
self.pure_instance_variable3: bytes
sharkdp marked this conversation as resolved.
Show resolved Hide resolved

# declared and bound
self.pure_instance_variable4: bool = True

# possibly undeclared/unbound
if flag:
self.pure_instance_variable5: str = "possibly set in __init__"

c_instance = C(1)

# TODO: should be `Literal["value set in __init__"]` (or `str` which would probably be more generally useful)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question. Both pyright and mypy have heuristics to widen literal types when they think it's "probably more useful." This seems a bit unprincipled to me, but I'm open to the possibility that we'll also need to do it. I guess my question is, if the only assignment to self.pure_instance_variable1 that occurs anywhere in the class is self.pure_instance_variable1 = "value set in __init__", then why wouldn't you want it typed as Literal["value set in __init__"]? Just to accommodate external code setting it to some other value? That seems like an edge case, and not too much to ask you to add an explicit : str if that's what you wanted.

Copy link
Member

@AlexWaygood AlexWaygood Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the only assignment to self.pure_instance_variable1 that occurs anywhere in the class is self.pure_instance_variable1 = "value set in __init__", then why wouldn't you want it typed as Literal["value set in __init__"]? Just to accommodate external code setting it to some other value?

Subclasses also won't be able to assign a type to the attribute unless the type is assignable to the Literal[...] type

That seems like an edge case

Hmm, I don't agree. I think it's quite common for code to only assign an instance variable in the __init__ method of the class, but with the intention that it should be possible to reassign the attribute from other scopes.

and not too much to ask you to add an explicit : str if that's what you wanted.

Surely that violates the gradual guarantee? And this doesn't seem to tackle at all the problem of typed code interacting with untyped third-party code, which may be unwilling to add type annotations in a timely manner?

I'm leaning towards either Unknown | Literal[...] or Unknown | str for cases like these. Haven't made my mind up about which is preferable there (haven't thought too deeply about it yet!)

Copy link
Contributor

@carljm carljm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely that violates the gradual guarantee

Yes, but so does widening the literal type to str. That's what I mean by "unprincipled" -- it doesn't fundamentally change anything, just arbitrarily chooses some wider type because we think it's "probably what you meant."

I like the idea of a union with Unknown for inferred attribute types! I think that is the principled gradual-guarantee approach here. It will be more forgiving than what people are used to, but I generally like it if we take the gradual guarantee more seriously than existing type checkers: if you want more strictness, annotate.

(I don't see why we would special-case converting it to Unknown | str instead of Unknown | Literal[...] if we're unioning with Unknown -- I think the union already takes care of the problematic aspects of the literal inference, by allowing you to assign anything.)

reveal_type(c_instance.pure_instance_variable1) # revealed: @Todo(instance attributes)

# TODO: should be `int`
reveal_type(c_instance.pure_instance_variable2) # revealed: @Todo(instance attributes)

# TODO: should be `bytes`
reveal_type(c_instance.pure_instance_variable3) # revealed: @Todo(instance attributes)

# TODO: should be `Literal[True]` (or `bool`)
reveal_type(c_instance.pure_instance_variable4) # revealed: @Todo(instance attributes)

# TODO: should be `Literal["possibly set in __init__"]` (or `str`)
# We probably don't want to emit a diagnostic for this being possibly undeclared/unbound.
# mypy and pyright do not show an error here.
reveal_type(c_instance.pure_instance_variable5) # revealed: @Todo(instance attributes)

c_instance.pure_instance_variable1 = "value set on instance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how we answer the above question, arguably this should be an error


# TODO: this should be an error (incompatible types in assignment)
c_instance.pure_instance_variable2 = "incompatible"

# TODO: we already show an error here but the message might be improved?
# mypy shows no error here, but pyright raises "reportAttributeAccessIssue"
# error: [unresolved-attribute] "Type `Literal[C]` has no attribute `pure_instance_variable1`"
reveal_type(C.pure_instance_variable1) # revealed: Unknown

# TODO: this should be an error (pure instance variables cannot be accessed on the class)
# mypy shows no error here, but pyright raises "reportAttributeAccessIssue"
C.pure_instance_variable1 = "overwritten on class"

# TODO: should ideally be `Literal["value set on instance"]`
sharkdp marked this conversation as resolved.
Show resolved Hide resolved
# (due to earlier assignment of the attribute from the global scope)
reveal_type(c_instance.pure_instance_variable1) # revealed: @Todo(instance attributes)
Comment on lines +66 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some question about the extent to which we'll want to do this narrowing. It's unsound in general (because we can't really say with any certainty what happens to c_instance between that assignment and this check), and particularly likely to be unsound in global scope. But we'll probably need to do some of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had this conversation several times now 😆 I still favour the more pragmatic behaviour of mypy and pyright over the stricter behaviour of pyre here. But yeah, we can debate the details at a later stage

```

#### Variable declared in class body and declared/bound in `__init__`

The same rule applies even if the variable is *declared* (not bound!) in the class body: it is still
a pure instance variable.

```py
class C:
pure_instance_variable: str

def __init__(self) -> None:
self.pure_instance_variable = "value set in __init__"

c_instance = C()

# TODO: should be `str`
reveal_type(c_instance.pure_instance_variable) # revealed: @Todo(instance attributes)

# TODO: we currently plan to emit a diagnostic here. Note that both mypy
# and pyright show no error in this case! So we may reconsider this in
# the future, if it turns out to produce too many false positives.
reveal_type(C.pure_instance_variable) # revealed: str

# TODO: same as above. We plan to emit a diagnostic here, even if both mypy
# and pyright allow this.
C.pure_instance_variable = "overwritten on class"

# TODO: this should be an error (incompatible types in assignment)
c_instance.pure_instance_variable = 1
```

#### Variable only defined in unrelated method

We also recognize pure instance variables if they are defined in a method that is not `__init__`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we shouldn't but this will require traversing the entire class body. This could be somewhat expensive (at least it's more expensive than only doing so for __init__).

  • Does that include accesses inside of nested classes, functions or lambdas?
  • Does this include accesses where the variable isn't self
class C:
	pure_instance_variable: str
	
	def test(self, other: Self, value: str):
		other.pure_instance_variable = value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy and pyright both support this, so I think we have to, expensive though it may be. Users will expect their type checker to support this and will complain if it doesn't. It's also unfortunately reasonably common in Python to do things like

class Foo:
    def __init__(self):
        self.initialize_x_variable()

    def initialize_x_variable(self):
        # many lines of complicated logic
        self.x = 42
  • Does that include accesses inside of nested classes, functions or lambdas?

  • Does this include accesses where the variable isn't self

I think we probably don't need to support these, certainly not initially

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also experiment with varying degrees of laziness here. For first-party code that we're actually checking, I think we'll probably need to know all instance attributes of every class. But for third-party/stdlib code that our first-party code is interacting with, it often probably isn't necessary to know what instance attributes the class has and, even if it is, it might not be necessary to materialize the full set of instance attributes the class has. For something like the following, we could plausibly short-circuit if we find the foo attribute defined in __init__ or __new__, and not bother analyzing the other methods the SomeClass defines:

from third_party import SomeClass

x = SomeClass()
print(x.foo + 5)  # all we need to know here is whether `SomeClass` has a `foo` attribute
                  # and what its type is


```py
class C:
def set_instance_variable(self) -> None:
self.pure_instance_variable: str = "value set in method"
sharkdp marked this conversation as resolved.
Show resolved Hide resolved

c_instance = C()

# for a more realistic example, let's actually call the method
c_instance.set_instance_variable()
Comment on lines +111 to +113
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the even more realistic example would have this method called from __init__, but if we do that in the test it raises more questions about whether we're trying to detect that call...


# TODO: should be `str`
reveal_type(c_instance.pure_instance_variable) # revealed: @Todo(instance attributes)

# TODO: We already show an error here, but the message might be improved?
# error: [unresolved-attribute]
reveal_type(C.pure_instance_variable) # revealed: Unknown

# TODO: this should be an error
C.pure_instance_variable = "overwritten on class"
```

#### Variable declared in class body and not bound anywhere

If a variable is declared in the class body but not bound anywhere, we still consider it a pure
instance variable and allow access to it via instances.

```py
class C:
pure_instance_variable: str

c_instance = C()

# TODO: should be 'str'
reveal_type(c_instance.pure_instance_variable) # revealed: @Todo(instance attributes)

# TODO: mypy and pyright do not show an error here, but we plan to emit a diagnostic.
# The type could be changed to 'Unknown' if we decide to emit an error?
reveal_type(C.pure_instance_variable) # revealed: str

# TODO: mypy and pyright do not show an error here, but we plan to emit one.
C.pure_instance_variable = "overwritten on class"
```

### Pure class variables (`ClassVar`)
sharkdp marked this conversation as resolved.
Show resolved Hide resolved
sharkdp marked this conversation as resolved.
Show resolved Hide resolved

#### Annotated with `ClassVar` type qualifier

Class variables annotated with the [`typing.ClassVar`] type qualifier are pure class variables. They
cannot be overwritten on instances, but they can be accessed on instances.

For more details, see the [typing documentation on `ClassVar`].

```py
from typing import ClassVar

class C:
pure_class_variable1: ClassVar[str] = "value in class body"
pure_class_variable2: ClassVar = 1

reveal_type(C.pure_class_variable1) # revealed: str

# TODO: this should be `Literal[1]`
sharkdp marked this conversation as resolved.
Show resolved Hide resolved
reveal_type(C.pure_class_variable2) # revealed: @Todo(Unsupported or invalid type in a type expression)

c_instance = C()

# TODO: This should be `str`. It is okay to access a pure class variable on an instance.
reveal_type(c_instance.pure_class_variable1) # revealed: @Todo(instance attributes)

# TODO: should raise an error. It is not allowed to reassign a pure class variable on an instance.
c_instance.pure_class_variable1 = "value set on instance"

C.pure_class_variable1 = "overwritten on class"

# TODO: should ideally be `Literal["overwritten on class"]`, but not a priority
reveal_type(C.pure_class_variable1) # revealed: str

# TODO: should raise an error (incompatible types in assignment)
C.pure_class_variable1 = 1

class Subclass(C):
pure_class_variable1: ClassVar[str] = "overwritten on subclass"

reveal_type(Subclass.pure_class_variable1) # revealed: str
```

#### Variable only mentioned in a class method

We also consider a class variable to be a pure class variable if it is only mentioned in a class
method.
sharkdp marked this conversation as resolved.
Show resolved Hide resolved

```py
class C:
@classmethod
def class_method(cls):
cls.pure_class_variable = "value set in class method"

# for a more realistic example, let's actually call the method
C.class_method()

# TODO: We currently plan to support this and show no error here.
# mypy shows an error here, pyright does not.
# error: [unresolved-attribute]
reveal_type(C.pure_class_variable) # revealed: Unknown

C.pure_class_variable = "overwritten on class"

# TODO: should be `Literal["overwritten on class"]`
# error: [unresolved-attribute]
reveal_type(C.pure_class_variable) # revealed: Unknown

c_instance = C()
# TODO: should be `Literal["overwritten on class"]` or `str`
reveal_type(c_instance.pure_class_variable) # revealed: @Todo(instance attributes)

# TODO: should raise an error.
c_instance.pure_class_variable = "value set on instance"
```

### Instance variables with class-level default values

These are instance attributes, but the fact that we can see that they have a binding (not a
declaration) in the class body means that reading the value from the class directly is also
permitted. This is the only difference for these attributes as opposed to "pure" instance
attributes.

#### Basic

```py
class C:
variable_with_class_default: str = "value in class body"

def instance_method(self):
self.variable_with_class_default = "value set in instance method"

reveal_type(C.variable_with_class_default) # revealed: str

c_instance = C()

# TODO: should be `str`
reveal_type(c_instance.variable_with_class_default) # revealed: @Todo(instance attributes)

c_instance.variable_with_class_default = "value set on instance"

reveal_type(C.variable_with_class_default) # revealed: str

# TODO: should ideally be Literal["value set on instance"], or still `str`
reveal_type(c_instance.variable_with_class_default) # revealed: @Todo(instance attributes)

C.variable_with_class_default = "overwritten on class"

# TODO: should ideally be `Literal["overwritten on class"]`
reveal_type(C.variable_with_class_default) # revealed: str

# TODO: should still be `Literal["value set on instance"]`
reveal_type(c_instance.variable_with_class_default) # revealed: @Todo(instance attributes)
```

## Union of attributes

```py
Expand All @@ -24,7 +282,9 @@ def _(flag: bool):
reveal_type(C2.x) # revealed: Literal[3, 4]
```

## Inherited attributes
## Inherited class attributes

### Basic

```py
class A:
Expand All @@ -36,7 +296,7 @@ class C(B): ...
reveal_type(C.X) # revealed: Literal["foo"]
```

## Inherited attributes (multiple inheritance)
### Multiple inheritance

```py
class O: ...
Expand Down Expand Up @@ -104,7 +364,7 @@ def _(flag: bool, flag1: bool, flag2: bool):
reveal_type(C.x) # revealed: Literal[1, 2, 3]
```

## Unions with all paths unbound
### Unions with all paths unbound

If the symbol is unbound in all elements of the union, we detect that:

Expand Down Expand Up @@ -158,7 +418,9 @@ class Foo: ...
reveal_type(Foo.__class__) # revealed: Literal[type]
```

## Function-literal attributes
## Literal types

### Function-literal attributes

Most attribute accesses on function-literal types are delegated to `types.FunctionType`, since all
functions are instances of that class:
Expand All @@ -179,7 +441,7 @@ reveal_type(f.__get__) # revealed: @Todo(`__get__` method on functions)
reveal_type(f.__call__) # revealed: @Todo(`__call__` method on functions)
```

## Int-literal attributes
### Int-literal attributes

Most attribute accesses on int-literal types are delegated to `builtins.int`, since all literal
integers are instances of that class:
Expand All @@ -196,7 +458,7 @@ reveal_type((2).numerator) # revealed: Literal[2]
reveal_type((2).real) # revealed: Literal[2]
```

## Literal `bool` attributes
### Bool-literal attributes

Most attribute accesses on bool-literal types are delegated to `builtins.bool`, since all literal
bols are instances of that class:
Expand All @@ -213,11 +475,20 @@ reveal_type(True.numerator) # revealed: Literal[1]
reveal_type(False.real) # revealed: Literal[0]
```

## Bytes-literal attributes
### Bytes-literal attributes

All attribute access on literal `bytes` types is currently delegated to `buitins.bytes`:

```py
reveal_type(b"foo".join) # revealed: @Todo(instance attributes)
reveal_type(b"foo".endswith) # revealed: @Todo(instance attributes)
```

## References
sharkdp marked this conversation as resolved.
Show resolved Hide resolved

Some of the tests in the *Class and instance variables* section draw inspiration from
[pyright's documentation] on this topic.

[pyright's documentation]: https://microsoft.github.io/pyright/#/type-concepts-advanced?id=class-and-instance-variables
[typing documentation on `classvar`]: https://typing.readthedocs.io/en/latest/spec/class-compat.html#classvar
sharkdp marked this conversation as resolved.
Show resolved Hide resolved
[`typing.classvar`]: https://docs.python.org/3/library/typing.html#typing.ClassVar
4 changes: 2 additions & 2 deletions crates/red_knot_python_semantic/src/types/infer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5346,8 +5346,8 @@ impl<'db> TypeInferenceBuilder<'db> {
todo_type!("`NotRequired[]` type qualifier")
}
KnownInstanceType::ClassVar => {
self.infer_type_expression(arguments_slice);
todo_type!("`ClassVar[]` type qualifier")
let ty = self.infer_type_expression(arguments_slice);
ty
sharkdp marked this conversation as resolved.
Show resolved Hide resolved
}
KnownInstanceType::Final => {
self.infer_type_expression(arguments_slice);
Expand Down
Loading