Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing support for dtypes #16545

Closed
shoyer opened this issue Dec 12, 2017 · 9 comments · Fixed by #17719
Closed

Typing support for dtypes #16545

shoyer opened this issue Dec 12, 2017 · 9 comments · Fixed by #17719

Comments

@shoyer
Copy link
Member

shoyer commented Dec 12, 2017

Per #7370 (comment), we could specify dtype as part of ndarray types (including type inference) by using NumPy's scalar types, e.g., np.ndarray[Any, np.float64] or np.ndarray[np.float64, Any] (where Any indicates the shape).

Is this a good way to indicate dtypes? How would we handle dtypes that don't have a well specified scalar type? For example:

  • datetime64 and timedelta64 do not have precision as part of the type
  • np.void is used for all arbitrary structured dtypes, but doesn't indicate the field names or subtypes (side note: np.struct would be a much better name)
  • string dtypes have their size as part of the type

One simple answer is to not worry about more detailed type information for now, and worry about more precise specification for scalar types later, when support in the typing module becomes available (e.g., for integer types or custom struct-like data structures like TypedDict).

@rmcgibbo
Copy link
Contributor

For complex dtypes like datetime64, np.void, and the string dtypes, I propose that we delay dealing with them until this package is much more fully developed.

Very little functionality is lost in the meantime if numpy functions that return arrays of dtype('S1') are labeled as returning (or are inferred to return) results of type ndarray[Any, str]. More specificity can be added later.

@ryanpeach
Copy link

This is really all I need, what is the issue? Anything I can do?

@person142
Copy link
Member

Circling back around to this: another issue is that you can mutate an array's dtype in-place, e.g.

>>> x = np.ones((2,), dtype=np.int64)
>>> x
array([1, 1])
>>> x.dtype = np.bool_
>>> x
array([ True, False, False, False, False, False, False, False,  True,
       False, False, False, False, False, False, False])

So if you tried to add types:

x: np.ndarray[np.int64] = np.ones((2,), dtype=np.int64)
x.dtype = np.bool_  # Uh oh, the type is incorrect now.

I imagine that most of the time people don't do that (and use views instead), which we could handle quite nicely in some cases with the right overload:

T = TypeVar('T', bound=dtype)

class ndarray(...):
    ...
    @overload
    def view(self, dtype: T = ...) -> ndarray[T]: ...

A practical solution might be to say "if you are going to set dtype, you are on your own", and ask that in such cases a user do

x: np.ndarray[Any] = np.ones((2,), dtype=np.int64)
x.dtype = np.bool_  # Still ok

Thoughts?

@shoyer
Copy link
Member Author

shoyer commented Mar 29, 2020

I don't think we should support modifying array dtype in place. There's really not good reason to do this rather than making a new view.

Likewise, I don't think we should support mutating shape, though if I recall that may be the only way to do reshaping while ensuring that an error would be raised if a copy would be needed.

@person142
Copy link
Member

I don't think we should support modifying array dtype in place

Should we go so far as removing the setter from the types?

https://github.com/numpy/numpy-stubs/blob/master/numpy-stubs/__init__.pyi#L300

person142 referenced this issue in person142/numpy-stubs Mar 29, 2020
See discussion in https://github.com/numpy/numpy-stubs/issues/7.

Though this is valid NumPy, users should be using `ndarray.view`
instead of setting the dtype, and setting the dtype prevents sensibly
making `ndarray` generic over dtype because of situations like this:

```
x = np.ones((2,), dtype=np.int64)  # type is ndarray[np.int64]
x.dtype = np.bool_  # What is the type now?
```

If someone really wants to do this, they will now have add an ignore,
which should make it clear that type safety is going out the window.
@person142
Copy link
Member

PR banning the setter: numpy/numpy-stubs#47

shoyer referenced this issue in numpy/numpy-stubs Mar 30, 2020
See discussion in https://github.com/numpy/numpy-stubs/issues/7.

Though this is valid NumPy, users should be using `ndarray.view`
instead of setting the dtype, and setting the dtype prevents sensibly
making `ndarray` generic over dtype because of situations like this:

```
x = np.ones((2,), dtype=np.int64)  # type is ndarray[np.int64]
x.dtype = np.bool_  # What is the type now?
```

If someone really wants to do this, they will now have add an ignore,
which should make it clear that type safety is going out the window.
@person142 person142 transferred this issue from numpy/numpy-stubs Jun 9, 2020
@person142
Copy link
Member

Updating this issue with my current plan to support dtypes; see also the discussion with @seberg in numpy/numpy-stubs#48.

@BvB93
Copy link
Member

BvB93 commented Nov 5, 2020

The PR for introducing dtype support is up: #17719.

@davetapley
Copy link

Sanity check, should something like datetime64['us', UTC]) type check okay?
I get Expected no type arguments for class "datetime64" 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants