-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hooks for XArray operations #1938
Comments
Thanks for leading the development of sparse. Currently, our logic to support Do we need to be capable of supporting other objects for future extension? @shoyer, |
Then I would suggest something like the following for hooks (omitting imports): # Registered in order of priority
xarray.interfaces.register('DaskArray', lambda ar: isinstance(ar, da.array))
xarray.hooks.register('nansum', 'DaskArray', da.nansum)
xarray.interfaces.register('SparseArray', lambda ar: isinstance(ar, sparse.SparseArray))
xarray.hooks.register('nansum', 'SparseArray', sparse.nansum) And then, in code, call the appropriate nansum = xarray.hooks.get(arr, 'nansum') If you need help, I'd be willing to give it. :-) But I'm not a user of XArray, so I don't really understand the use-cases or codebase. |
For two array backends, it didn't make sense to write an abstraction layer for this, in part because it wasn't clear what we needed. But for three examples, it probably does -- that's the point where shared use cases become clear. Undoubtedly, there will be other cases in the future where users will want to extend xarray to handle new array types (arrays with units come to mind). For implementing these overloads/functions, there are various possible solutions. Our current ad-hoc system is similar to what @hameerabbasi suggests -- we check the type of the first argument and use that to dispatch to an appropriate function. This has the advantage of being easy to implement for a known set of types, but a single dispatch order is not very extensible -- it's impossible to anticipate every third-party class. Recently, NumPy has moved away from this (e.g., with One appealing option is to make use of @mrocklin's multipledispatch library, which was originally developed for Blaze and is still in active use. Possible concerns:
|
Import times on multipledispatch have improved thanks to work by @llllllllll . They could probably be further improved if people wanted to invest modest intellectual effort here. Costs scale with the number of type signatures on each operation. In blaze this was very high, well into the hundreds, in our case it would be, I think, more modest around 2-10. (also, historical note, multipledispatch predates my involvement in Blaze). When possible it would be useful to upstream these concerns to NumPy, even if we have to move faster than NumPy is able to support. |
Dispatch for stack/concatenate is definitely on the radar for NumPy development, but I don't know when it's actually going to happen. The likely interface is something like We only need this for a couple of operations, so in any case we can probably implement our own ad-hoc dispatch system for On further contemplation, overloading based on union types with a system like multipledispatch does seem tricky. It's not clear to me that there's even a well defined type for inputs to concatenate that should be dispatched to dask vs. numpy, for example. We want to let that dask handle any cases where at least one input is a dask array, but a type like |
In blaze we have variadic sequences for multiple dispatch, and the Here is an example of what that looks like for |
@llllllllll very cool! Is there a special trick I need to use this? I tried: # first: pip install https://github.com/blaze/blaze/archive/master.tar.gz
import blaze.compute
from blaze.compute.varargs import VarArgs
from multipledispatch import dispatch
@dispatch(VarArgs[float])
def f(args):
print('floats')
@dispatch(VarArgs[str])
def f(args):
print('strings')
@dispatch(VarArgs[str, float])
def f(args):
print('mixed') This gives me an error when I try to use it: >>> f(['foo'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
154 try:
--> 155 func = self._cache[types]
156 except KeyError:
KeyError: (<class 'list'>,)
During handling of the above exception, another exception occurred:
NotImplementedError Traceback (most recent call last)
<ipython-input-5-19f52a9a1dd6> in <module>()
----> 1 f(['foo'])
/usr/local/lib/python3.6/dist-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
159 raise NotImplementedError(
160 'Could not find signature for %s: <%s>' %
--> 161 (self.name, str_signature(types)))
162 self._cache[types] = func
163 try:
NotImplementedError: Could not find signature for f: <list> |
In [1]: from blaze.compute.varargs import VarArgs
In [2]: from multipledispatch import dispatch
In [3]: @dispatch(VarArgs[float])
...: def f(args):
...: print('floats')
...:
In [4]: @dispatch(VarArgs[str])
...: def f(args):
...: print('strings')
...:
In [5]: @dispatch(VarArgs[str, float])
...: def f(args):
...: print('mixed')
...:
In [6]: f(VarArgs(['foo']))
strings
In [7]: f(VarArgs([1.0]))
floats
In [8]: f(VarArgs([1.0, 'foo']))
mixed
In [9]: VarArgs([1.0, 'foo'])
Out[9]: VarArgs[float, str]([1.0, 'foo']) You could hide this behind a top-level function that wraps the input for the user, or register a dispatch for list which boxes and recurses into itself. |
Can't some wild metaprogramming make it so that |
We could make a particular list an instance of a particular |
The wrapping dispatch would just look like: @dispatch(list)
def f(args):
return f(VarArgs(args)) |
How about something like checking inside a list if something is top priority, then call |
Yes, I just tested out the wrapping dispatch. It works and is quite clean. |
As for my last concern, "Dispatch for the first argument(s) only" it looks like the simple answer is that multipledispatch already only dispatches based on positional arguments. So as long as we're strict about using keyword arguments for extra parameters like It looks like this resolves almost all of my concerns about using multiple dispatch. One thing that would be nice is it |
Usually, this is not a good idea. The problem is that it's impossible to know a global priority order across unrelated packages. It's usually better to declare valid type matches explicitly. NumPy tried this with |
I wouldn't mind submitting this upstream, but I will defer to @mrocklin. |
I would want to see how magical it was. @llllllllll 's calibration of "mild metaprogramming" may differ slightly from my own :) Eventually if multipledispatch becomes a dependency of xarray then we should consider changing the decision-making process away from being just me though. Relatedly, SymPy also just adopted it (by vendoring) as a dependency. |
@mrocklin this is roughy what we would want in multipledispatch: This involves metaclasses, which frankly do blow my mind a little bit. Probably the magic could be tuned down a little bit, but metaclasses are necessary at least for implementing |
Another benefit to this would be that if XArray didn't want to support a particular library in its own code, the library itself could add the hooks. |
cc @jcrist , who has historically been interested in how we solve this problem within dask.array |
This might even help us out in Sparse for dispatch with |
Is there a way to handle kwargs (not with types, but ignoring them)? |
Yes, |
@llllllllll How hard would it be to make this work for star-args? I realize you could just add an extra wrapper but it'd be nice if you didn't have to. |
Something like Actually it'd be nice to have something like |
I spent some time thinking about this today. The cleanest answer is probably support for standard typing annotations in multipledispatch, at least for |
Given the issues raised on that PR as well as the profiling results shown here I think that PR will need some serious work before it could be merged. |
@hameerabbasi This really doesn't work with |
Indeed, typing support for multipledispatch looks it's a ways off. To be
honest, the VarArgs solution looks a little ugly to me, so I'm not sure
it's with enshrining in multipledispatch either. I guess that leaves
putting our own ad-hoc solution on top of multipledispatch in xarray for
now. Which really is totally fine -- this is all a stop gap measure until
NumPy itself supports this sort of duck typing.
…On Sat, Feb 24, 2018 at 7:46 PM Joe Jevnik ***@***.***> wrote:
Given the issues raised on that PR as well as the profiling results shown
here
<mrocklin/multipledispatch#66 (comment)>
I think that PR will need some serious work before it could be merged.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1938 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1lV_Y3wryiNPWH8OB9_WrV5nmOy6ks5tYNeMgaJpZM4SQsHy>
.
|
You're assuming here most users of XArray would be using a recent version of Numpy... Which is a totally fine assumption IMO. We make the same one for sparse. However, consider that some people may be using something like conda, which (because of complex dependencies and all) may end up delaying updates (both for Numpy and XArray). I guess however; if people really wanted the updates they could use pip.
I would say a little clean-up with some extra decorators for exactly this purpose may be in order, that way, individual wrapping functions aren't needed. |
In pydata/sparse#1 (comment) @shoyer mentions that some work could likely progress in XArray before deciding on the VarArgs in multipledispatch. If XArray maintainers have time it might be valuable to lay out how that would look so that other devs can try it out. |
I'm thinking it could make sense to build this minimal library for "duck typed arrays" with multipledispatch outside of xarray. That would make it easier for library builders to use and extend it. Anyone interested in getting started o nthat? |
By minimal library, I'm assuming you mean something of the sort discussed about abstract arrays? What functionality would such a library have? |
Basically, the library would define functions like |
By "muktipledy" I mean "multipledispatch"(on my phone) |
This library would have hard dependencies only on numpy and multipledispatch, and would expose a multipledispatch namespace so extending it doesn't have to happen in the library itself. |
Doing this externally sounds sensible to me. Thoughts on a good name?
duck_array seems to be free on PyPI
…On Thu, Apr 19, 2018 at 4:23 PM, Stephan Hoyer ***@***.***> wrote:
This library would have hard dependencies only on numpy and
multipledispatch, and would expose a multipledispatch namespace so
extending it doesn't have to happen in the library itself.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1938 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszI-z2bvzo597NWGzF0E9J486VBbHks5tqPJNgaJpZM4SQsHy>
.
|
I like Should we go ahead and start |
I've created one, as per your e-mail: https://github.com/hameerabbasi/arrayish The name is inspired from a recent discussion about this on the Numpy mailing list. |
What name should we go with? I have a slight preference for duckarray over
arrayish but happy with whatever the group decides.
…On Fri, Apr 20, 2018 at 1:51 AM, Hameer Abbasi ***@***.***> wrote:
I've created one, as per your e-mail: https://github.com/
hameerabbasi/arrayish
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1938 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszJ0A1I96lO8uHy4rO2Oj_35znavlks5tqXdJgaJpZM4SQsHy>
.
|
Happy with arrayish too
…On Fri, Apr 20, 2018 at 9:59 AM, Matthew Rocklin ***@***.***> wrote:
What name should we go with? I have a slight preference for duckarray
over arrayish but happy with whatever the group decides.
On Fri, Apr 20, 2018 at 1:51 AM, Hameer Abbasi ***@***.***>
wrote:
> I've created one, as per your e-mail: https://github.com/hameerabbas
> i/arrayish
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1938 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AASszJ0A1I96lO8uHy4rO2Oj_35znavlks5tqXdJgaJpZM4SQsHy>
> .
>
|
I've written it up and already released version 0.0.1 on PyPI, except Also, |
Thanks for taking the initiative here @hameerabbasi ! It's good to see something up already. Here is a link to the discussion that I think @hameerabbasi is referring to: http://numpy-discussion.10968.n7.nabble.com/new-NEP-np-AbstractArray-and-np-asabstractarray-tt45282.html#none I haven't read through that entirely yet, was arrayish decided on by the community or was the term still up for discussion? |
Let's move this discussion over to hameerabbasi/arrayish#1. But, in summary, I got the impression that the community in general is unhappy with the name "duck arrays". |
I am sitting in the SciPy talk about CuPy. Would be great if someone could give us an update on how this issue stands before tomorrow's xarray sprint. Someone my want to try plugging CuPy arrays into xarray. But this issue doesn't really resolve the best way to do that. As far as I can tell @hameerabbasi's "arrayish" project was deprecated in favor of uarray / unumpy. What is the best path forward as of today, July 12, 2019? |
|
@hameerabbasi - are you at SciPy by any chance? |
@jacobtomlinson got things sorta-working with NEP-18 and CuPy in an afternoon in Iris (with a strong emphasis on "kinda"). On the CuPy side you're fine. If you're on NumPy 1.16 you'll need to enable the
If you're using Numpy 1.17 then this is on by default. I think that most of the work here is on the Xarray side. We'll need to remove things like explicit type checks. |
@rabernat I can attend remotely. |
We're at the point where this could be hacked together pretty quickly:
|
In hope of cleaner dask and sparse support (pydata/sparse#1), I wanted to suggest hooks for XArray operations.
Something like the following:
Functions would work something like the following:
(the register would fall back to Numpy if nothing is found)
I would argue that this should be in Numpy, but it's a huge project to put it there.
The text was updated successfully, but these errors were encountered: