Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'to_iris' and 'from_iris' to methods Dataset #2449

Closed
wants to merge 5 commits into from

Conversation

jacobtomlinson
Copy link
Contributor

This PR adds to_iris and from_iris methods to DataSet. I've added this because I frequently find myself writing little list and dictionary comprehensions to pack and unpack both DataSets from DataArrays and Iris CubeLists from Cubes.

  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only hesitation here is on the name: is it entirely obvious that Dataset.to_iris() should produce a CubeList? Maybe Dataset.to_iris_cubelist()?

I guess Iris doesn't have other high level data structures for multiple cubes, so this is the obvious choice.

def dataset_from_iris(cubelist):
""" Convert an Iris CubeList into a Dataset.
"""
return Dataset({cube.var_name: DataArray.from_iris(cube) for cube in cubelist})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the name attribute already on DataArray.from_iris(cube)? We have some special logic already for figuring out names in DataArray.from_iris.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cube.name instead of cube.var_name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say DataArray.from_iris(cube).name, but it probably makes sense save it in an intermediate variable to avoid converting the cube twice.

@jacobtomlinson
Copy link
Contributor Author

Thanks for the feedback.

I think when working in iris that you expect multiple cubes to be returned as a cube list. Also when calling iris.load you expect a cubelist.

I would propose that to_iris is fine, however if you feel strongly I would be happy to change it.

@shoyer
Copy link
Member

shoyer commented Oct 6, 2018

I'm happy to defer to Iris users (like you) on what they would expect for converting to/from an xarray.Dataset.

@dcherian dcherian mentioned this pull request Oct 24, 2018
5 tasks
@jhamman
Copy link
Member

jhamman commented Nov 5, 2018

I wonder if we can get @pelson to weigh in here. Like @shoyer said, whatever the Iris users think makes most sense for the naming of these methods is fine by me.

@DPeterK
Copy link

DPeterK commented Jan 23, 2019

Apologies about adding my thoughts after a bit of a gap on here...

My only hesitation here is on the name

I like the to_iris and from_iris method names suggested here. They're consistent with existing functionality available for DataArray, and I think it's reasonable to expect that as DataArrays map to cubes, so also for Datasets and CubeLists. As it's possible for both Datasets and CubeLists to contain only a single object there could be some extra logic to return a DataArray or cube respectively in such a case, but I think that would add needless complexity as it's not hard from a user perspective to get back to the single item from the Dataset or CubeList.

We have some special logic already for figuring out names in DataArray.from_iris

Does this logic include handling multiple cubes of the same name in a single CubeList? Iris will quite happily handle this, but I guess the name:DataArray mapping in Xarray requires unique names. For example:

>>> names = [c.name() for c in cubes]
>>> print(names)
['air_pressure', 'air_pressure', 'air_pressure_at_sea_level', 'air_temperature', 'air_temperature', 'air_temperature', 'air_temperature', 'air_temperature', 'dew_point_temperature', 'geopotential_height', 'relative_humidity', 'relative_humidity', 'specific_humidity', 'surface_air_pressure', 'upward_air_velocity', 'x_wind', 'x_wind', 'x_wind', 'y_wind', 'y_wind', 'y_wind']

The differences between these cubes is one or more of:

  • one cube describes the phenomenon at the surface, and
  • another cube the phenomenon on height or pressure levels, or
  • another cube describes the phenomenon after statistical processing, and so on.

If such a case isn't currently handled, it could be handled by using this differing metadata to modify the name used for the key; for example air_temperature --> air_temperature__maximum_1_hr – so long as returning to a CubeList will also return the names to their originals.

@jhamman
Copy link
Member

jhamman commented Sep 14, 2023

@DPeterK / @jacobtomlinson - this has grown quite stale. Any interest in finishing this up or should we close this in favor of a new contribution down the line?

@jhamman jhamman added the plan to close May be closeable, needs more eyeballs label Sep 14, 2023
@trexfeathers
Copy link

@pp-mo 👀

@jacobtomlinson
Copy link
Contributor Author

@jhamman this PR is 5 years old and my need for it has long since gone away. If other folks want to pick this up that would be awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants