Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dtype / checking standardization #152

Closed
kkraus14 opened this issue Mar 25, 2021 · 12 comments · Fixed by #273
Closed

Dtype / checking standardization #152

kkraus14 opened this issue Mar 25, 2021 · 12 comments · Fixed by #273

Comments

@kkraus14
Copy link

In the current specification, there are not standardized data type objects, or a specification as to what a data type object needs to implement: https://data-apis.org/array-api/latest/API_specification/data_types.html

Additionally, there's no APIs in the specification for checking dtypes, i.e. something like np.issubdtype that is somewhat commonly used in code that would look something like:

def dispatch_based_on_dtype(array):
    if np.issubdtype(array.dtype, np.integer):
        return ...
    elif np.issubdtype(array.dtype, np.floating):
        return ...
    else:
        return ...

cc @jakirkham as I believe this pattern is used quite a bit in Dask which would presumably want to be able to target arbitrary array objects under the hood

@jakirkham
Copy link
Member

With NumPy, one can do this. However I don't think the dtype spec includes type or the ability to make this kind of check AFAICT

In [1]: from numbers import Integral                                            

In [2]: import numpy as np                                                      

In [3]: a = np.arange(5)                                                        

In [4]: issubclass(a.dtype.type, Integral)                                      
Out[4]: True
In [1]: from numbers import Real                                                

In [2]: import numpy as np                                                      

In [3]: a = np.arange(5).astype(float)                                          

In [4]: issubclass(a.dtype.type, Real)                                          
Out[4]: True

@asmeurer
Copy link
Member

asmeurer commented Apr 1, 2021

https://data-apis.org/array-api/latest/API_specification/data_types.html says

Data types (“dtypes”) are objects that can be used as dtype specifiers in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification.

(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)

I will point out that if you only care about the dtypes in the spec, there are a finite number of them, so you can use

def is_floating(dtype):
    return dtype in [float32, float64]

def is_integral(dtype):
    return dtype in [int8, ...]

@leofang
Copy link
Contributor

leofang commented Apr 1, 2021

(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)

Sounds like something we should add.

By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.

@kkraus14
Copy link
Author

kkraus14 commented Apr 2, 2021

The docs weren't clear to me, but I didn't interpret those as singletons and instead thought that it was just guidance on what data types need to be supported by the standard.

@asmeurer
Copy link
Member

asmeurer commented Apr 2, 2021

The dtype names need to be part of the namespace. Otherwise passing dtype=float64 and so on won't be possible. If that isn't clear, we should fix it.

Sounds like something we should add. By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.

isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).

@jakirkham
Copy link
Member

FWIW numbers objects are ABC's so we can just call register on each subclass to add float32 and float64 if subclassing is not an option

@leofang
Copy link
Contributor

leofang commented Apr 2, 2021

isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).

@asmeurer I think you misread. If I wanna do isinstance(x, float32) on x, it must be a dtype object like float32, float64, etc. It doesn't make sense to check if an array is an instance of a dtype.

@jakirkham
Copy link
Member

Right this is also why in the OP it mentions that with NumPy, we need to dtype.type today. There's no equivalent of this in the spec AFAICT

@leofang
Copy link
Contributor

leofang commented Apr 3, 2021

So IIUC the current standard does not specify what property or method a dtype object needs to implement:

Data types ("dtypes") are objects that can be used as `dtype` specifiers in functions and methods (e.g., `zeros((2, 3), dtype=float32)`). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification.

so for now I think it is a no-go to check dtype.type. We need either the equal check (==) or something equivalent added or the standard revised I think.

@leofang
Copy link
Contributor

leofang commented Apr 3, 2021

(And also it doesn't seem that subclassing is an option, depending on how one interprets the standard.)

@kgryte
Copy link
Contributor

kgryte commented May 5, 2022

Reopening this issue as relevant to the proposal in #425.

@kgryte kgryte reopened this May 5, 2022
@rgommers
Copy link
Member

All the discussion is happening in gh-425 and this is basically a duplicate issue, so I'll close this again. And we'll try to finalize gh-425 very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants