Dtype / checking standardization #152

kkraus14 · 2021-03-25T18:25:23Z

In the current specification, there are not standardized data type objects, or a specification as to what a data type object needs to implement: https://data-apis.org/array-api/latest/API_specification/data_types.html

Additionally, there's no APIs in the specification for checking dtypes, i.e. something like np.issubdtype that is somewhat commonly used in code that would look something like:

def dispatch_based_on_dtype(array):
    if np.issubdtype(array.dtype, np.integer):
        return ...
    elif np.issubdtype(array.dtype, np.floating):
        return ...
    else:
        return ...

cc @jakirkham as I believe this pattern is used quite a bit in Dask which would presumably want to be able to target arbitrary array objects under the hood

The text was updated successfully, but these errors were encountered:

jakirkham · 2021-03-25T18:30:28Z

With NumPy, one can do this. However I don't think the dtype spec includes type or the ability to make this kind of check AFAICT

In [1]: from numbers import Integral                                            

In [2]: import numpy as np                                                      

In [3]: a = np.arange(5)                                                        

In [4]: issubclass(a.dtype.type, Integral)                                      
Out[4]: True

In [1]: from numbers import Real                                                

In [2]: import numpy as np                                                      

In [3]: a = np.arange(5).astype(float)                                          

In [4]: issubclass(a.dtype.type, Real)                                          
Out[4]: True

asmeurer · 2021-04-01T22:04:10Z

https://data-apis.org/array-api/latest/API_specification/data_types.html says

Data types (“dtypes”) are objects that can be used as dtype specifiers in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification.

(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)

I will point out that if you only care about the dtypes in the spec, there are a finite number of them, so you can use

def is_floating(dtype):
    return dtype in [float32, float64]

def is_integral(dtype):
    return dtype in [int8, ...]

leofang · 2021-04-01T23:07:09Z

(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)

Sounds like something we should add.

By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.

kkraus14 · 2021-04-02T00:53:15Z

The docs weren't clear to me, but I didn't interpret those as singletons and instead thought that it was just guidance on what data types need to be supported by the standard.

asmeurer · 2021-04-02T20:04:09Z

The dtype names need to be part of the namespace. Otherwise passing dtype=float64 and so on won't be possible. If that isn't clear, we should fix it.

Sounds like something we should add. By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.

isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).

jakirkham · 2021-04-02T21:14:06Z

FWIW numbers objects are ABC's so we can just call register on each subclass to add float32 and float64 if subclassing is not an option

leofang · 2021-04-02T21:16:41Z

isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).

@asmeurer I think you misread. If I wanna do isinstance(x, float32) on x, it must be a dtype object like float32, float64, etc. It doesn't make sense to check if an array is an instance of a dtype.

jakirkham · 2021-04-02T21:20:50Z

Right this is also why in the OP it mentions that with NumPy, we need to dtype.type today. There's no equivalent of this in the spec AFAICT

leofang · 2021-04-03T03:12:21Z

So IIUC the current standard does not specify what property or method a dtype object needs to implement:

array-api/spec/API_specification/data_types.md

Line 74 in 0941067

    
           Data types ("dtypes") are objects that can be used as `dtype` specifiers in functions and methods (e.g., `zeros((2, 3), dtype=float32)`). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification.

so for now I think it is a no-go to check dtype.type. We need either the equal check (==) or something equivalent added or the standard revised I think.

leofang · 2021-04-03T03:13:59Z

(And also it doesn't seem that subclassing is an option, depending on how one interprets the standard.)

kgryte · 2022-05-05T18:09:36Z

Reopening this issue as relevant to the proposal in #425.

rgommers · 2022-09-20T18:39:57Z

All the discussion is happening in gh-425 and this is basically a duplicate issue, so I'll close this again. And we'll try to finalize gh-425 very soon.

honno mentioned this issue Jul 20, 2021

Support for Array API HypothesisWorks/hypothesis#3037

Closed

kgryte mentioned this issue Sep 20, 2021

Require that data type objects implement __eq__ in order to test for data type equality #273

Merged

kgryte closed this as completed in #273 Sep 20, 2021

jakirkham mentioned this issue May 5, 2022

RFC: add data type inspection utilities to the array API specification #425

Closed

kgryte reopened this May 5, 2022

rgommers closed this as completed Sep 20, 2022

rgommers mentioned this issue Dec 23, 2022

Should there be a integral and floating ABCs? #581

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dtype / checking standardization #152

Dtype / checking standardization #152

kkraus14 commented Mar 25, 2021

jakirkham commented Mar 25, 2021

asmeurer commented Apr 1, 2021

leofang commented Apr 1, 2021 •

edited

Loading

kkraus14 commented Apr 2, 2021

asmeurer commented Apr 2, 2021

jakirkham commented Apr 2, 2021

leofang commented Apr 2, 2021

jakirkham commented Apr 2, 2021

leofang commented Apr 3, 2021

leofang commented Apr 3, 2021

kgryte commented May 5, 2022

rgommers commented Sep 20, 2022

Dtype / checking standardization #152

Dtype / checking standardization #152

Comments

kkraus14 commented Mar 25, 2021

jakirkham commented Mar 25, 2021

asmeurer commented Apr 1, 2021

leofang commented Apr 1, 2021 • edited Loading

kkraus14 commented Apr 2, 2021

asmeurer commented Apr 2, 2021

jakirkham commented Apr 2, 2021

leofang commented Apr 2, 2021

jakirkham commented Apr 2, 2021

leofang commented Apr 3, 2021

leofang commented Apr 3, 2021

kgryte commented May 5, 2022

rgommers commented Sep 20, 2022

leofang commented Apr 1, 2021 •

edited

Loading