Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic Coordinate Identification #871

Merged
merged 3 commits into from
Jul 2, 2018
Merged

Systematic Coordinate Identification #871

merged 3 commits into from
Jul 2, 2018

Conversation

jthielen
Copy link
Collaborator

@jthielen jthielen commented Jun 8, 2018

A first cut at implementing #860, this implements systematic coordinate mapping utilities to the Dataset accessor according to CF standards/recommendations/options and uses those to assign appropriate axis attributes to the coordinates in parse_cf. This also adds some properties to the DataArray accessor to obtain easy access to the coordinates corresponding to a particular axis.

This PR is a work is progress, but I wanted to get it out there to run it by the checks, and to get some initial feedback. Once the initial implementation of everything (roughly) looks good, I plan on working on tests.

(Also, @dopplershift and @jrleeman, for now, I decided to go with vertical instead of level or levels, and time instead of times, as those seemed more natural as I was working through it.)

@jthielen jthielen added Area: Infrastructure Pertains to project infrastructure (e.g. CI, linting) Type: Feature New functionality labels Jun 8, 2018
@jthielen jthielen added this to the 0.9 milestone Jun 8, 2018
@jthielen jthielen requested review from dopplershift and jrleeman June 8, 2018 23:16
metpy/xarray.py Outdated
@staticmethod
def _check_vertical(var):
# Check for standard name (CF option)
if var.attrs.get('standard_name') in {'air_pressure', 'height', 'geopotential_height',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level is pretty common too I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that part of CF?

Copy link
Collaborator Author

@jthielen jthielen Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find level in the standard name table, but I did find model_level_number, which could be good to add.

Also, there's an appendix in the CF conventions with a number of additional vertical coordinate standard names...should I include those as well?

metpy/xarray.py Outdated
def _fixup_coordinate_map(coord_map, var):
"""Ensure sure we have coordinate variables in map, not coordinate names."""
for axis in coord_map:
if type(coord_map[axis]) == str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more standard to use if isinstance(coord_map[axis], str):

@jthielen
Copy link
Collaborator Author

One thing I've been thinking about as I work through applications of this...right now, using this (for example, to iterate over levels) would look something like

temperature = data.parse_cf('Temperature_isobaric')
for level in temperature.metpy.vertical:
    temperature_at_level = temperature[temperature.metpy.vertical == level]

This could get verbose rather quickly with lots of calls to the coordinate variables. Would it be worth replacing and/or supplementing it with something like this?

temperature = data.parse_cf('Temperature_isobaric')
x, y, vertical = rh.coordinate_map('x', 'y', 'vertical')
for level in vertical:
    temperature_at_level = temperature[vertical == level]

(this should be fairly straightforward to implement as a wrapper for what I now have as MetPyAccessor._axis)

@jrleeman
Copy link
Contributor

I'm in favor of that - maybe spell it as just coordinates though? Even less typing 😄

@jthielen
Copy link
Collaborator Author

@jrleeman I've added it in addition to the properties for now, but let me know if it is better as a replacement.

Copy link
Contributor

@jrleeman jrleeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things - a more in-depth review is next

metpy/xarray.py Outdated
@xr.register_dataarray_accessor('metpy')
class MetPyAccessor(object):
"""Provide custom attributes and methods on XArray DataArray for MetPy functionality."""

nice_to_cf_axis_map = {'time': 'T', 'vertical': 'Z', 'y': 'Y', 'x': 'X'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but I'd probably call this something like readable_to_cf_names or similar. I've started trying to avoid map as part of variable names as it sometimes is redundant or even confusing in some contexts.

metpy/xarray.py Outdated
for coord_var in self._data_array.coords.values():
if coord_var.attrs.get('axis') == self.nice_to_cf_axis_map[axis]:
return coord_var
raise AttributeError(axis + ' attribute is not available due to lack of ' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably stop after not available.

metpy/xarray.py Outdated
@staticmethod
def _check_vertical(var):
# Check for standard name (CF option)
if var.attrs.get('standard_name') in {'air_pressure', 'height', 'geopotential_height',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that part of CF?

@jthielen
Copy link
Collaborator Author

@jrleeman Some notes about the changes I've made based on our discussions:

  • The suggested name change from coord_map to coordinate_names didn't quite seem right as I got to implementing it, since you could be passing in a dict of coordinate names or coordinate variables, so for now, I made it coordinates.
  • I first tried just generalizing _check_x and _check_y, but I noticed many overlaps with the others once I made that change, so I just went ahead and combined all of them. Let me know what you think of the approach now.
  • Fixing the issue I was having parsing a full dataset (crs coordinates not matching up between variables) was actually as easy as implementing __eq__ and __ne__ on the CFProjection object, so hopefully that should be working now.

Copy link
Contributor

@jrleeman jrleeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy after this refactor - I think we're in good shape now, but let's get @dopplershift to weigh in today and then test time!

@jrleeman
Copy link
Contributor

jrleeman commented Jul 2, 2018

Unless @dopplershift has issues, I'd say lets get this in ASAP.

Copy link
Member

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm ready for this to go in. Just a couple minor things. This xarray tutorial will be an especially nice addition.

def test_bad_coordinate_type(test_var):
"""Test that an AttributeError is raised when a bad axis/coordinate type is given."""
with pytest.raises(AttributeError) as exc:
axis, = test_var.metpy.coordinates('bad_axis_type')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call the function here, no need to do anything with the return value, so just:

test_var.metpy.coordinates('bad_axis_type')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried that, but because the coordinates method is a generator function, I needed to assign it in order for the underlying function to run and raise the error. Is there a better way around this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, generator. You could wrap the call in next(...).

#
# Additionally, we take a few optional steps to clean up our dataset for later use:
#
# - Drop the extra grid mapping variable since we have already parsed that information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why we're going to the effort of dropping a variable. Does it help with printing or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point...I was being over-zealous in my clean up of the Dataset.

jthielen added 2 commits July 2, 2018 10:57
Adds coordinate mapping utilities to the Dataset accessor, and uses them
in `parse_cf` to relabel the `axis` attributes appropriately. Then, adds
in methods to the DataArray accessor so that the spatiotemporal
coordinates can be easily accessed.
This commit adds a new tutorial to demonstrate the extent of MetPy's
xarray integration, and how to perform basic operations. This tutorial
should be kept up to date as integration with xarray improves.
@jthielen jthielen changed the title WIP: Systematic Coordinate Identification Systematic Coordinate Identification Jul 2, 2018
This commit adds tests for systematic coordinate identification.
Copy link
Member

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI.

@dopplershift dopplershift merged commit b848a06 into Unidata:master Jul 2, 2018
@jthielen
Copy link
Collaborator Author

jthielen commented Jul 2, 2018

Just after this was merged, I discovered a cleaner way of checking the equality of DataArray's: http://xarray.pydata.org/en/stable/combining.html#equals-and-identical. Should I submit a follow-up PR cleaning up the tests, or is what is there now good enough?

@jrleeman
Copy link
Contributor

jrleeman commented Jul 3, 2018

If it's not terribly time-consuming I'd say a follow-on PR would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Infrastructure Pertains to project infrastructure (e.g. CI, linting) Type: Feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants