Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: partial typing of masked array #31728

Merged
merged 3 commits into from
Feb 12, 2020

Conversation

simonjayhawkins
Copy link
Member

No description provided.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Feb 5, 2020
return self._data.nbytes + self._mask.nbytes

@classmethod
def _concat_same_type(cls, to_concat):
def _concat_same_type(cls: Type[BaseMaskedArrayT], to_concat) -> BaseMaskedArrayT:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add the typevar to avoid...

pandas\core\arrays\integer.py:117: error: Incompatible return value type (got "BaseMaskedArray", expected "IntegerArray")
pandas\core\arrays\boolean.py:122: error: Incompatible return value type (got "BaseMaskedArray", expected "BooleanArray")

we can't use the unbound typevar from pandas._typing here otherwise we get...

pandas\core\arrays\masked.py:183: error: Too many arguments for "object"

since the typevar is needed here, it is also used for the other methods that return type(self)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in another issue, I don't have problems with it if there is no way around it, but it needs to be documented then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a problem; it's documented in PEP 484

https://www.python.org/dev/peps/pep-0484/#annotating-instance-and-class-methods

# The value used to fill '_data' to avoid upcasting
_internal_fill_value: "Scalar"

def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ needs to be declared in the base class...

pandas\core\arrays\masked.py:56: error: Too many arguments for "BaseMaskedArray"
pandas\core\arrays\masked.py:181: error: Too many arguments for "BaseMaskedArray"
pandas\core\arrays\masked.py:207: error: Too many arguments for "BaseMaskedArray"
pandas\core\arrays\masked.py:207: error: Unexpected keyword argument "copy" for "BaseMaskedArray"
pandas\core\arrays\masked.py:213: error: Too many arguments for "BaseMaskedArray"
pandas\core\arrays\masked.py:213: error: Unexpected keyword argument "copy" for "BaseMaskedArray"

also creating this ensures that the subclasses have the correct signature for the constructor to work with __invert__, _concat_same_type, take and copy from the base class.

we could just use a AbstractMethodError but I think it makes sense to put the shared functionality here.

BooleanArray has checking for values.ndim and mask.ndim. IntegerArray does not. It may make sense to have that check here also if applicable to IntegerArray.

mask = mask.copy()

self._data = values
self._mask = mask
self._dtype = BooleanDtype()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a class attribute. no need to be assigned in constructor?

@@ -387,7 +382,7 @@ def __setitem__(self, key, value):
self._data[key] = value
self._mask[key] = mask

def astype(self, dtype, copy=True):
def astype(self, dtype, copy: bool = True) -> Union[np.ndarray, BaseMaskedArray]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a differentiator between this and ArrayLike from pandas._typing (save the TypeVar)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could, since I assume that astype should eventually support other ExtensionArrays other than IntegerArray and BooleanArray.


self._data = values
self._mask = mask
super().__init__(values, mask, copy=copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which superclass performs this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseMaskedArray?

@@ -317,18 +313,18 @@ def map_string(s):
scalars = [map_string(x) for x in strings]
return cls._from_sequence(scalars, dtype, copy)

def _values_for_factorize(self) -> Tuple[np.ndarray, Any]:
def _values_for_factorize(self) -> Tuple[np.ndarray, int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, though, the "Any" is correct (I don't know what the typing should do here, but the signature of the base class would use "Any", so any place where _values_for_factorize is called would need to assume "Any")

return self._data.nbytes + self._mask.nbytes

@classmethod
def _concat_same_type(cls, to_concat):
def _concat_same_type(cls: Type[BaseMaskedArrayT], to_concat) -> BaseMaskedArrayT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in another issue, I don't have problems with it if there is no way around it, but it needs to be documented then

data = np.concatenate([x._data for x in to_concat])
mask = np.concatenate([x._mask for x in to_concat])
return cls(data, mask)

def take(self, indexer, allow_fill=False, fill_value=None):
def take(
self: BaseMaskedArrayT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We otherwise don't type self, or do we?
(self is always the type of the class, no?)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@WillAyd WillAyd added this to the 1.1 milestone Feb 12, 2020
@WillAyd WillAyd merged commit 35174ae into pandas-dev:master Feb 12, 2020
@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2020

Thanks @simonjayhawkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants