Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrappers for opening datasets and data variables #81

Merged
merged 2 commits into from
Aug 10, 2021

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Jul 6, 2021

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder self-assigned this Jul 6, 2021
@codecov
Copy link

codecov bot commented Jul 6, 2021

Codecov Report

Merging #81 (46974d9) into main (6b811fa) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main       #81   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         5    +1     
  Lines           85       138   +53     
=========================================
+ Hits            85       138   +53     
Impacted Files Coverage Δ
xcdat/logger.py 100.00% <ø> (ø)
xcdat/bounds.py 100.00% <100.00%> (ø)
xcdat/dataset.py 100.00% <100.00%> (ø)
xcdat/variable.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b811fa...46974d9. Read the comment docs.

@tomvothecoder tomvothecoder changed the title Add open_dataset wrapper to apply common operations initially Add DataArray accessor to add custom attributes to a Dataset data variable Jul 8, 2021
@tomvothecoder tomvothecoder requested a review from pochedls July 8, 2021 20:58
xcdat/dataset.py Outdated Show resolved Hide resolved
xcdat/variable.py Outdated Show resolved Hide resolved
xcdat/variable.py Outdated Show resolved Hide resolved
@tomvothecoder tomvothecoder requested a review from jasonb5 July 8, 2021 22:30
xcdat/dataset.py Outdated Show resolved Hide resolved
xcdat/dataset.py Outdated Show resolved Hide resolved
xcdat/dataset.py Outdated Show resolved Hide resolved
@tomvothecoder tomvothecoder changed the title Add DataArray accessor to add custom attributes to a Dataset data variable Add DataArray accessor to add custom attributes (e.g., bounds) to a data variable Jul 19, 2021
@tomvothecoder tomvothecoder changed the title Add DataArray accessor to add custom attributes (e.g., bounds) to a data variable Add wrapper for opening Datasets and DataArray accessor to store attributes (e.g., bounds) Jul 19, 2021
@tomvothecoder tomvothecoder changed the title Add wrapper for opening Datasets and DataArray accessor to store attributes (e.g., bounds) Add wrapper for opening Datasets and a DataArray accessor to store attributes (e.g., bounds) Jul 19, 2021
@tomvothecoder tomvothecoder changed the title Add wrapper for opening Datasets and a DataArray accessor to store attributes (e.g., bounds) Add Dataset and data variable wrappers to apply common operations upon opening Jul 22, 2021
@tomvothecoder tomvothecoder force-pushed the feature/80-open-dataset branch 2 times, most recently from dce6966 to 0f5744c Compare July 23, 2021 21:34
@tomvothecoder tomvothecoder force-pushed the feature/80-open-dataset branch from 5697cf7 to 5b303ed Compare August 4, 2021 17:33
@tomvothecoder tomvothecoder changed the title Add Dataset and data variable wrappers to apply common operations upon opening Add wrappers for opening datasets and data variables to apply common operations (e.g., generate bounds) Aug 4, 2021
@tomvothecoder tomvothecoder force-pushed the feature/80-open-dataset branch from 5b303ed to c6ab44d Compare August 4, 2021 17:44
- Add dataset.py and variable.py which stores wrappers
- Update .gitignore
- Update axis.py to bounds.py to explicitly express intent
- Update readthedocs.yml
- Update setup.py with requirements list
- Update package imports in meta.yaml and api.rst
- Add `get_bounds_for_all_coords()` method
@tomvothecoder tomvothecoder force-pushed the feature/80-open-dataset branch from c6ab44d to b076b44 Compare August 4, 2021 17:47
@tomvothecoder tomvothecoder requested a review from pochedls August 4, 2021 17:56
@tomvothecoder tomvothecoder changed the title Add wrappers for opening datasets and data variables to apply common operations (e.g., generate bounds) Add wrappers for opening datasets and data variable Aug 4, 2021
@tomvothecoder tomvothecoder changed the title Add wrappers for opening datasets and data variable Add wrappers for opening datasets and data variables Aug 4, 2021
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Aug 6, 2021

I just found out an interesting behavior related to the bounds accessor class and its attributes.

When you open a DataArray/Dataset with an accessor class and add attributes, then make a copy of it, the attributes don't propagate to the copy (which means you lose them). This sounds like it will cause problems with data integrity as you work with variables, make copies, pass to functions, try to access the bounds, etc.

Extending xarray using accessors seems best for computed properties and not persistent object attributes, or if you don't need to make copies of the original variable.

Code example:

# Open and set the bounds accessor class attributes
------------------------------------------------------
>>> ts = open_variable(ds, "ts")

# Access an attribute
------------------------------------------------------
>>> print(ts.bounds.lat)
<xarray.DataArray 'lat_bnds' (lat: 145, bnds: 2)>
array([[-90.   , -89.375],
       [-89.375, -88.125],
       [-88.125, -86.875],
       ...,
       [ 86.875,  88.125],
       [ 88.125,  89.375],
       [ 89.375,  90.   ]])
Coordinates:
  * lat      (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
Dimensions without coordinates: bnds

# Make a copy of the variable
------------------------------------------------------
>>> ts_copy = ts.copy()

# Try to access the bounds attribute again. It's lost now.
------------------------------------------------------
>>> print(ts_copy.bounds.lat)
None

Alternate Solution

The alternative solution is to store bounds inside the DataArray coordinates instead of attributes of the bounds accessor class.

Pros:

  • Bounds are persistent even when copying since they are stored as coords
  • Use native xarray syntax to access bounds instead decorator (ts.lat_bnds vs. ts.bounds.lat)

Cons:

  • xarray might store existing bounds inside a Dataset as data variables
    • Not sure if that's default behavior or NetCDF file output decision
    • Not a big deal though if we work at the data variable/DataArray level
  • Have to check which bounds are set in the dataset then pass them down -- cf_xarray can handle this

Notes:

  • Adds an extra dimension, bnds/bounds, to the DataArray
# 1. Copy bounds into each data variable's coordinates
# ------------------------------------------------------------
ts = ds.ts.copy()
ts = ts.expand_dims(bnds=np.array([0,1]))
ts = ts.assign_coords({"lon_bnds": ds.lon_bnds, "lat_bnds": ds.lat_bnds, "time_bnds": ds.time_bnds})

>>> ts.coords
Coordinates:
  * time       (time) datetime64[ns] 1850-01-16T12:00:00 ... 2005-12-16T12:00:00
  * lat        (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon        (lon) float64 0.0 1.875 3.75 5.625 ... 352.5 354.4 356.2 358.1
    lon_bnds   (lon, bnds) float64 -0.9375 0.9375 0.9375 ... 357.2 357.2 359.1
    lat_bnds   (lat, bnds) float64 -90.0 -89.38 -89.38 ... 89.38 89.38 90.0
    time_bnds  (time, bnds) datetime64[ns] 1850-01-01 1850-02-01 ... 2006-01-01


# 2. Access bounds using default xarray syntax 
# ------------------------------------------------------------
>>> ts.lat_bnds # or ts.coords["lat_bnds"]
<xarray.DataArray 'lat_bnds' (lat: 145, bnds: 2)>
array([[-90.   , -89.375],
       [-89.375, -88.125],
       [-88.125, -86.875],
       ...,
       [ 86.875,  88.125],
       [ 88.125,  89.375],
       [ 89.375,  90.   ]])
Coordinates:
  * lat       (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
    lat_bnds  (lat, bnds) float64 -90.0 -89.38 -89.38 ... 89.38 89.38 90.0

Related GitHub Issues:

@tomvothecoder tomvothecoder merged commit 5918313 into main Aug 10, 2021
@tomvothecoder tomvothecoder deleted the feature/80-open-dataset branch August 10, 2021 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
2 participants