Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve accuracy of six byte index methods #9117

Merged
merged 11 commits into from
Nov 10, 2022

Conversation

Avasam
Copy link
Collaborator

@Avasam Avasam commented Nov 6, 2022

Specifically: byte2int, indexbytes, and iterbytes

While working in a library that uses six, I noticed that I was getting false-positive typing issues. The 3 functions above accept more buffer-like types than just bytes.

I've also updated itemgetter to actually be generic and not use Any

Relates to #9006

Possible improvement: Is there a way to properly unpack *items into a tuple of N-length? I feel like I've seen something, but idk if mypy supports it.

@@ -1,4 +1,5 @@
import sys
from builtins import _GetItemIterable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use _typeshed.SupportsGetItem instead?

Copy link
Collaborator Author

@Avasam Avasam Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot as-is, because SupportsGetItem defines __contains__, AND it is defined as def __contains__(self, __x: object) -> bool: ....
Not all indexable types (including buffer-like ones) define __contains__ *

bytes defines it as def __contains__(self, __o: SupportsIndex | bytes) -> bool: ..., and I am uncertain of the change required to make it work. (other than just typing the parameter __x as Any).

* On that note, mmap missing __contains__ seems to be a mistake:

>>> from mmap import mmap
>>> 0 in mmap(0, 5)     
False
>>> b'\x00' in mmap(0, 5)
True

I've done what I think are the necessary changes.

def indexbytes(buf: bytes, i: int) -> int: ...
def iterbytes(buf: bytes) -> _Iterator[int]: ...

byte2int = operator.itemgetter(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not legal in a stub.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, didn't catch that. I accidentally ran the tests locally with py instead of python (venv fail)

@JelleZijlstra
Copy link
Member

Possible improvement: Is there a way to properly unpack *items into a tuple of N-length? I feel like I've seen something, but idk if mypy supports it.

PEP 646 allows that, but mypy doesn't have full support yet.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Avasam
Copy link
Collaborator Author

Avasam commented Nov 7, 2022

Interesting note about that last mypy_primer run, is that mypy and pyright give slightly different types, and pyright gives no error:
(min repro from homeassistant/components/icloud/account.py:304)

import operator

reveal_type(operator.itemgetter(1).__call__)

test_dict = min({"first": 1, "second": 2}, key=operator.itemgetter(1))  # Ok
test_items = min({"first": 1, "second": 2}.items(), key=operator.itemgetter(1))  # mypy error

reveal_type(test_dict)
reveal_type(test_items)

pyright: (obj: SupportsGetItem[int, _T@__call__]) -> _T@__call__

PS C:\Users\Avasam\Desktop> pyright .\operator_reveal_type.py --typeshed-path=C:/Users/Avasam/Documents/Git/typeshed
No configuration file found.
No pyproject.toml file found.
stubPath C:\Users\Avasam\Desktop\typings is not a valid directory.
Assuming Python platform Windows
Searching for source files
Found 1 source file
pyright 1.1.278
C:\Users\Avasam\Desktop\operator_reveal_type.py
  C:\Users\Avasam\Desktop\operator_reveal_type.py:3:13 - information: Type of "operator.itemgetter(1).__call__" is "(obj: SupportsGetItem[int, _T@__call__]) -> _T@__call__"
  C:\Users\Avasam\Desktop\operator_reveal_type.py:8:13 - information: Type of "test_dict" is "str"
  C:\Users\Avasam\Desktop\operator_reveal_type.py:9:13 - information: Type of "test_items" is "tuple[str, int]"
0 errors, 0 warnings, 3 informations
Completed in 0.917sec

mypy: Callable[[Arg(SupportsGetItem[int, _T], 'obj')], _T]

PS C:\Users\Avasam\Desktop> mypy .\operator_reveal_type.py --custom-typeshed-dir=C:/Users/Avasam/Documents/Git/typeshed
operator_reveal_type.py:3: note: Revealed type is "def [_T] (obj: _typeshed.SupportsGetItem[builtins.int, _T`2]) -> _T`2"
operator_reveal_type.py:6: error: Argument "key" to "min" has incompatible type "itemgetter[int]"; expected "Callable[[Tuple[str, int]], Union[SupportsDunderLT[Any], SupportsDunderGT[Any]]]"      
operator_reveal_type.py:6: note: "itemgetter[int].__call__" has type "Callable[[Arg(SupportsGetItem[int, _T], 'obj')], _T]"
operator_reveal_type.py:8: note: Revealed type is "builtins.str"
operator_reveal_type.py:9: note: Revealed type is "Tuple[builtins.str, builtins.int]"
Found 1 error in 1 file (checked 1 source file)

Yet mypy knows about the correct return type. Is this a mypy bug?

@Avasam Avasam requested a review from JelleZijlstra November 7, 2022 02:45
stdlib/mmap.pyi Outdated
@@ -68,7 +68,8 @@ class mmap(Iterable[int], Sized):
@overload
def __setitem__(self, __index: slice, __object: ReadableBuffer) -> None: ...
# Doesn't actually exist, but the object is actually iterable because it has __getitem__ and
# __len__, so we claim that there is also an __iter__ to help type checkers.
# __len__, so we claim that there is also an __iter__ and __contains__ to help type checkers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think __contains__ deserves a separate comment, since it's a separate and independent issue. The problem is that in works on objects that define __getitem__ but not __contains__, but type checkers (at least mypy and pyright; I just added support for this pattern in pyanalyze in quora/pyanalyze#564) don't know about this. That's a similar but separate issue to how objects with __getitem__ are iterable.

I think this is also why the SupportsGetItem protocol has __contains__.

@JelleZijlstra
Copy link
Member

That does look like a mypy bug.

I haven't looked too deeply at the mypy-primer changes but we'll have to fix these issues before we can merge the PR.

@Avasam
Copy link
Collaborator Author

Avasam commented Nov 7, 2022

I haven't looked too deeply at the mypy-primer changes [...]

Same, but the ones I have looked at stemmed from the same issue. I'm in no hurry to need this, so if it is confirmed to be a mypy issue and can be fixed relatively soon, then I can wait.
Otherwise, I should be able to type byte2int as a Callable / method with TypeVars, and do the changes in _operator.pyi another time.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Avasam
Copy link
Collaborator Author

Avasam commented Nov 7, 2022

Welp. At least SupportsGetItem[Any, Any] is better than Any. But that's unfortunate.
byte2int is back to being typed as a function. But now the parameter and return type are accurate (which was my original goal)

@Avasam Avasam requested a review from JelleZijlstra November 7, 2022 19:09
@github-actions

This comment has been minimized.

@hauntsaninja
Copy link
Collaborator

Didn't look closely, but mind opening an issue for the suspected mypy bug?

@@ -77,11 +78,9 @@ def delitem(__a: MutableSequence[Any], __b: slice) -> None: ...
@overload
def delitem(__a: MutableMapping[_K, Any], __b: _K) -> None: ...
@overload
def getitem(__a: Sequence[_T], __b: SupportsIndex) -> _T: ...
def getitem(__a: SupportsGetItem[_K, _V], __b: _K) -> _V: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this largely mask the slice overload in practice? E.g. if we call getitem([1, 2, 3], slice(1)), we should get a Sequence back, but I think this overload will be matched and we'll get a slice instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also worried about this, but I tried it out and it doesn't seem to be the case: https://mypy-play.net/?mypy=latest&python=3.10&gist=184cb82e0026c4bd0e9a6d5fa47eb1a1

Might be worth adding a test case, though

Copy link
Collaborator Author

@Avasam Avasam Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a slice will never match the first overload (at least with current type definitions) because:

Argument of type "slice" cannot be assigned to parameter "__b" of type "_K@getitem" in function "getitem"
  Type "slice" cannot be assigned to type "SupportsIndex"
    "slice" is incompatible with protocol "SupportsIndex"
      "__index__" is not present

Still I think it makes sense to put the more precise/least generic overload first. Out of good practice.

@@ -107,16 +106,25 @@ class attrgetter(Generic[_T_co]):
@final
class itemgetter(Generic[_T_co]):
@overload
def __new__(cls, item: Any) -> itemgetter[Any]: ...
def __new__(cls, item: _T_co) -> itemgetter[_T_co]: ... # type: ignore[misc]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining why we need the type ignore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did better and actually fixed the overlap

# "tuple[int, int]" is incompatible with protocol "SupportsIndex"
# preventing [_T_co, ...] instead of [Any, ...]
#
# A bug in mypy prevents using [..., _T] instead of [..., Any] here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should link to the bug in the mypy issue tracker.

stdlib/mmap.pyi Outdated
@@ -67,8 +67,11 @@ class mmap(Iterable[int], Sized):
def __setitem__(self, __index: int, __object: int) -> None: ...
@overload
def __setitem__(self, __index: slice, __object: ReadableBuffer) -> None: ...
# Doesn't actually exist, but the object is actually iterable because it has __getitem__ and
# __len__, so we claim that there is also an __iter__ to help type checkers.
# Doesn't actually exist, but the object is actually iterable because it has __getitem__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Doesn't actually exist, but the object is actually iterable because it has __getitem__,
# Doesn't actually exist, but the object actually supports "in" because it has __getitem__,

@Avasam
Copy link
Collaborator Author

Avasam commented Nov 8, 2022

Didn't look closely, but mind opening an issue for the suspected mypy bug?

I'll try to get a minimal repro. I'm not too sure where it is that's something was going wrong. Likely some interaction between TypeVar and Callable. There was a handful of types interacting together.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants