Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a pint-xarray package/module #849

Closed
jthielen opened this issue Aug 24, 2019 · 13 comments
Closed

Creating a pint-xarray package/module #849

jthielen opened this issue Aug 24, 2019 · 13 comments

Comments

@jthielen
Copy link
Contributor

Based on conversations with @shoyer on pydata/xarray#525, I'm thinking that the best way forward to fully integrate pint with xarray might be the creation of a "pint-xarray" module similar to "pint-pandas". It would likely house DataArray and Dataset accessor interfaces to allow for easy unit access, unit conversion, and IO integration within xarray. These accessor classes would be designed in such a way so as to provide easy mixins for downstream libraries like metpy that have their own accessors. I think tests of xarray functionality with wrapping pint would likely stay in xarray (as in pydata/xarray#3238).

@hgrecco Would this be something you would be willing to have and support here? I'd be willing to help maintain it (it would be the first package I would do so with, but thankfully it should be a fairly simple package).

Also, cc @andrewgsavage, @keewis, and @dopplershift, who I would think would also have thoughts on whether or not this is the best way forward for this integration.

xref #845, #764, #479, pydata/xarray#525, pydata/xarray#3238

@hgrecco
Copy link
Owner

hgrecco commented Aug 24, 2019

Sounds like a really good plan to me. I like extension packages as they allow faster iteration of a core idea without perturbing the more stable main package and a more frequent release cycle. I would really support it. @andrewgsavage can give you a good overview about the lessons we have learned with pint-pandas

I also think that it would be good that we get #764 right as it would affect all derived packages and we need to make sure that such big change is for the best of all the ecosystem.

@mraspaud
Copy link

This is something we have been needing for some time in pytroll/satpy. We'll be following this with attention.
Also, you may be aware of this, but the metpy package seems to implement some unit accessors: https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html#units

@andrewgsavage
Copy link
Collaborator

andrewgsavage commented Aug 31, 2019 via email

@shoyer
Copy link

shoyer commented Aug 31, 2019

One difference with xarray vs pandas is that with xarray we NumPy's array interface instead of inventing our own. So the basic functionality (propagation of units) should work fine without the xarray-pint module, which is only needed for extra unit specific methods.

@egparedes
Copy link

I would like to ask what is the current status of the pint-xarray package. In the NumPy Support notebook of the docs (pint/docs/numpy.ipynb ) there is a mention about an alpha release of the package planned for January 2020, is this still happenning? Is it possible to test the current development branch of this feature?

@keewis
Copy link
Contributor

keewis commented Jan 29, 2020

You can see the current status for xarray in pydata/xarray#3594. Most of the operations already propagate the units, the rest is a work in progress (there is still a lot to do in order to support quantities in indexes / dimension variables, though).

I don't think there has been any effort to create a pint-xarray package yet, which at most would provide additional features like an accessor that helps with converting units (i.e. da.pint.to(ureg.ns), which is a lot easier to read than da.copy(data=da.data.to(ureg.ns))).

@jthielen
Copy link
Contributor Author

jthielen commented Jan 29, 2020

In the NumPy Support notebook of the docs (pint/docs/numpy.ipynb ) there is a mention about an alpha release of the package planned for January 2020, is this still happenning?

Unfortunately, while I had hoped to have put together a pint-xarray package by now, personal and professional circumstances I have gotten in the way of me doing so. It is still very much in my near-future plans though (i.e. next few weeks), so keep an eye out here for updates.

As @keewis says, this pint-xarray package would a very simple one that just consists of accessors to make working with Quantities inside xarray objects easier. A rough API outline of what I would propose is as follows:

DataArray

  • da.pint.to(other_units): return dataarray with converted units
  • da.pint.units: units of quantity (as a Unit)
  • da.pint.magnitude: magnitude of quantity
  • da.pint.quantify(unit_registry=None, unit=None): create DataArray wrapping a Quantity based on string unit attribute of DataArray or specified unit
  • da.pint.dequantify(): replace data with the Quantity's magnitude, and add back string unit attribute from Quantity's unit
  • da.pint.sel(): wrap da.sel to handle indexing with Quantities (by casting to magnitude in the coordinate's units similar to how MetPy does it, since true unit-aware indexing is not available yet in xarray)
  • da.pint.loc: wrap da.loc likewise

Dataset

  • ds.pint.quantify(unit_registry=None): convert all data variables to quantities
  • ds.pint.dequantify(): convert all data variables from quantities to magnitudes with units as an attribute
  • ds.pint.sel(): wrap ds.sel to handle indexing with Quantities
  • ds.pint.loc: wrap ds.loc likewise

(this may be modified as things change on xarray's and pint's end, especially involving Dask arrays (xref #883))

@jthielen
Copy link
Contributor Author

Also, with imminent development in mind, @hgrecco would you want to have the main repository for pint-xarray be hosted on your GitHub as in pint-pandas, or would you want me to fully take the lead on it and have it on mine? Or, is there enough justification now for a separate "pint" organization (or similar) to host these?

@hgrecco
Copy link
Owner

hgrecco commented Jan 30, 2020

I would be great to have such package, you can host it in your own account and then we can tranfer it. If your prefer, I can create a repo in my account and give your full access.

Regarding the organization, I would be happy to start discussing something in that direction. I actually already did something like that when I transferred pyvisa to its own organization.

But maybe it would be nice to have some people from different open source projects such as MetPy and others using Pint to give an opinion about this.

@egparedes
Copy link

Thank you all for the comprehensive status update. I will keep a look here to check the progress of the pint-xarray package, but it looks like the current situation is better than what I expected, and thus I will start testing it it is already good enough for our projects.

bors bot added a commit that referenced this issue Mar 6, 2020
1042: Update expected initial release date for pint-xarray r=hgrecco a=jthielen

Just a quick update to the docs delaying the expected release for pint-xarray, since I haven't gotten around to it yet, and unfortunately likely won't get the chance until April. (Though, if someone wanted to take the ideas in #849 and get something put together earlier, that would be great too!)

xref #849 

- ~~Closes # (insert issue number)~~
- ~~Executed ``black -t py36 . && isort -rc . && flake8`` with no errors~~
- ~~The change is fully covered by automated unit tests~~
- [x] Documented in docs/ as appropriate
- ~~Added an entry to the CHANGES file~~


Co-authored-by: Jon Thielen <[email protected]>
@TomNicholas
Copy link
Contributor

I'm very keen for pint-xarray integration, so I started working on the pint-xarray package here. It's not got that much in it yet but you should be able to use ds.pint.quantify() to turn a dataset of arrays into a dataset of Quantity arrays at least.

I would welcome any help, and @hgrecco @jthielen @keewis do you want me to add you as maintainers?

With regards to the eventual API I was thinking @shoyer that this might be a good fit for using an entrypoint? If xarray.Dataset.units pointed to an entrypoint it could behave differently depending on which libraries you had installed, allowing you to choose which units library you want to use, while still giving them all access to the coveted .units namespace. There's already an entrypoint for plotting backends in xarray, and plans to add one for storage backends too, so there's precedent.

I also think that there should be a pint project to group the different repos now. It helps from a versioning, maintaining, and clarity standpoint, as well as just looking a bit more professional 💪

@jthielen
Copy link
Contributor Author

jthielen commented Apr 8, 2020

@TomNicholas That all sounds good, and yes, it would be great to be added as a maintainer on pint-xarray!

@jthielen
Copy link
Contributor Author

jthielen commented Jul 8, 2020

With @TomNicholas having set up pint-xarray and now moved to its hopefully-permanent home at https://github.com/xarray-contrib/pint-xarray, I think this issue can be safely closed. Further discussion on any of these items can happen there! Thank you again to @TomNicholas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants