-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion of a "cf-xarray" package #771
Comments
While the CF convention undoubtedly is useful for earth-surface/atmosphere coordinates, I wonder to what extent the ideas here can be generalised to more coordinate types. For instance, I was once an astronomer, and although they would (probably) never use CDF/HDF, they might use zarr and xarray (or other). I hope and expect there are many other uses. |
@martindurant My question on that would be to what end? What are the use cases that generalizing something like (I assume) a coordinate system object? Because both the input here (netCDF CF metdata conventions) and the output (likely cartopy and proj) are pretty earth-system specific. Not completely against it, but I'm not seeing what the common middle layer would provide. |
If those are the constraints, then you are right - but xarray is broader than that, so I could see this as being just one of a set of possible coordinate mapping conventions. |
The focus here (for "cf-xarray") was pretty strictly on CF metadata interpretation and application. I think additional discussion on coordinate mapping conventions for xarray in general would be great, but may be suited more for the discussion in #356? |
Possibly helpful to be aware of these CF/pyproj issues: |
Right but what I see is: Cf -> Common Middle -> Proj Unless it makes sense to map astro conventions to proj or CF metadata to the astro tool (which I'm assuming it doesn't), then that "Common Middle" needs some kind of shared and useful functionality to justify its existence, no? That's what I'm missing. |
You may be right; I commented there too, but the focus was more along the lines of what "projections" (i.e., transformation) mean and how we can store them rather than on what typical names for various physical quantities like "height", "frequency", "spherical surface coords" might be. |
It seems like there are 2 kinds of "useful" attributes.
1 is dependent on xarary making progress on things like duck array wrapping and custom indexes (i.e. will take time). 2 could be implemented right now as an extension of whatever metpy has AFAICT. |
From MetPy's perspective, we're happy to contribute what we have to a broader effort for interpreting CF metadata within xarray. I'm not sure how generally useful it is, but it certainly meets our requirements:
The overall goal is to facilitate changing of datasets without needing to adjust code--at least where that makes sense and the dataset contains the requisite metadata to make this a possibility. Step one of doing CF stuff with xarray may be fixing the parts where xarray blows up (ok, errors out) on CF-compliant netCDF files (see pydata/xarray#2233 and pydata/xarray#2368). It's been on my todo list for quite awhile to try to work on this problem but I can't seem to stop accumulating things on top this in my own stack. |
I think @dopplershift's proposal sounds perfect. One very ironic thing is that a cf-python package already exists: The group at Reading has basically created an entirely new stack for climate data analysis which duplicates the functionality of both xarray and dask! I have tried unsuccessfully to convince them to collaborate more (https://bitbucket.org/cfpython/cf-python/issues/51/collaborate-more-closely-with-xarray-iris). |
This is of great interest to the GeoCAT team here at NCAR, too. There's a lot of interest in seeing a solution here, and there is a lot of "small" projects out there that have attempted to solve part of the problem. CC: @NCAR/xdev @NCAR/geocat |
(P.S. Anyone know why the NCAR team @mentions above don't seem to work? Both teams are publicly visible, but they don't link...which means I don't think they are getting notifications.) |
I don't think you can link to teams defined outside the current org/repository. |
@djhoese: That's disappointing. For some reason, I thought you could. Kinda makes it hard to loop in potential external collaborators, then. |
With the help of @kmpaul's and @jthielen's input and code from MetPy, cf-xarray is now alive and welcoming contributions! https://github.com/xarray-contrib/cf-xarray |
With https://github.com/xarray-contrib/cf-xarray being well on its way (already to v0.1.5), I think this issue can be safely closed and any |
Below is a brief outline about a potential
cf-xarray
package, a library for parsing CF metadata in xarray objects to provide convenience methods on accessors for common operations. Hopefully this can start a discussion here about such a package after some initial comments here: JiaweiZhuang/xESMF#74These are all just my own interpretations of things at the moment based on my experiences with similar issues in MetPy, so please offer suggestions for modifications and improvements!
Breaking it down by top-level items of the current version (1.8) of the CF Conventions document:
NetCDF Files and Components
Relevant details are mostly covered by xarray itself (except for perhaps groups, which have open issues on xarray, e.g., pydata/xarray#2916).
Description of the Data
Units
While the units attribute allows basic tracking of the units within xarray itself, work towards more substantial unit support has been a longstanding effort in the community (pydata/xarray#525). Efforts have been converging recently around Pint integration with xarray, including the upcoming pint-xarray package to include unit-related functionality on accessors. Some (non-negligible) work will need to be done to ensure complete CF/UDUNITS compatibility, but this is something we have been and will continue to be working on in MetPy (Unidata/MetPy#1362).
Options to consider here:
cf-units is another package to keep in mind here, which has sort of an inverse problem to Pint: it is already fully CF/UDUNITS compliant, but doesn't have a corresponding duck array type or set of NumPy functions so that it could integrate with xarray (at least without attribute operation hooks).
Standard Name
A simple API for getting variables by their standard name (wrapping
filter_by_attrs
) would be useful. Perhapsds.cf[standard_name]
ords.cf.search_standard_name(standard_name)
Doing detailed parsing of constructed standard names and automatically applying appropriate operations to calculate them may be a cool feature, but I definitely wouldn't consider it a priority until there is a demonstrated need.
Ancillary Data
Another useful place for a simple helper. Something like
ds.cf.ancillary_variables(varname)
could return an iterable of linked ancillary variables in the dataset.Other Subsections
Long Name and Flags are other subsections here, but I'm not sure if there is anything useful for cf-xarray to do here.
Coordinate Types
This is one of the major motivating factors for a common cf-xarray package as brought up in JiaweiZhuang/xESMF#74. It has also been one of the core components of MetPy's accessor. There is definitely a broader need for these features, and at least one non-meteorology-specific package (xrviz) has an optional dependency on MetPy in order to take advantage of them.
The crux of MetPy's coordinate type identification comes in its
check_axis
function. This works by scanning the variable attributes forstandard_name
, an optional CF criteria for all four types_CoordinateAxisType
, to shortcut identification if already identified by a THREDDS serveraxis
, another optional CF criteria for direct identification of each typepositive
, a CF requirement when a non-pressure vertical coordinate is presentunits
, which has particular requirements for longitude, latitude, vertical pressure, and certain time coordinatesIf all those fail, MetPy also falls back to some conservative regex matching of variable names (but this is something I would not expect to see carried over to cf-xarray).
A discussion of the API here is definitely in order, as it will likely be one of the central components (not just for pulling out coordinates of a particular type like
da.cf.X
, but also convenience wrappers likeda.cf.sum(axis="X")
that handle automatic coordinate type recognition for much of xarray's API). The canonical labels from CF areX
,Y
,Z
, andT
. MetPy's implementation has diverged from this in favor ofx
,longitude
,y
,latitude
,vertical
, andtime
for a few reasons:Any preferences here on direct CF labels, MetPy-style labels, or some other solution?
xref geoxarray/geoxarray#10
Other Components
cftime is another package worth mentioning here that falls under this section of Coordinate Types
Would it be safe to leave out any special handling/parsing of parametric vertical coordinates, one of the other topics mentioned in the CF conventions under this section?
Coordinate Systems
Some of the earlier subsections of the CF conventions here (e.g., Independent Latitude, Longitude, Vertical, and Time Axes; Two-Dimensional Latitude, Longitude, Coordinate Variables) are addressed more in the coordinate identification above, but what is particularly worth noting here is the subsection on Horizontal Coordinate Reference Systems, Grid Mappings, and Projections.
This has been discussed at length elsewhere (particularly pydata/xarray#2288 and #356), so for now I'll just defer to those discussions for details. Also, here is @djhoese's relevant comment from the preceding discussion on JiaweiZhuang/xESMF#74:
In short, no matter how the details work out in the background, I'd imagine an API here of something like
da.cf.crs
, to get some kind of standard CRS object, which can then be converted as needed for data transformations, georeferenced calculations, and plotting.Labels and Alternative Coordinates
I don't think there is anything for cf-xarray to do here?
Data Representative of Cells
Another big need of cf-xarray which has been brought up in a lot of discussion in the past (#356). I'm less well-versed in this area, so I'd want to defer to others on the best APIs for getting appropriate cell bounds from coordinates. One other question I wanted to raise: is there anything that cf-xarray should do with respect to climatological statistics, which also falls under this section?
Reduction of Dataset Size
Would it be within scope to include helpers for uncompressing gathered data using MultiIndexes and sparse arrays?
Discrete Sampling Geometries
Would any special handling of DSG be within scope here (such as utilities for Pandas/GeoPandas conversion like Unidata/MetPy#1074)?
I think that's all, so again, please offer input/feedback/suggestions/improvements! I'm tagging several people that I saw spoke up on prior related issues, but please feel to loop anyone else into the discussion that I missed or who would be able to offer input.
cc @dcherian, @djhoese, @rabernat, @snowman2, @huard, @JiaweiZhuang, @rsignell-usgs, @martindurant, @hdsingh, @bekozi, @fmaussion, @dopplershift
The text was updated successfully, but these errors were encountered: