Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray accessor bounds attributes don't persist for all xarray functions that return new DataArrays #99

Closed
tomvothecoder opened this issue Aug 31, 2021 · 0 comments · Fixed by #100
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Aug 31, 2021

What are the steps to reproduce this issue?

  1. Open up a dataset using xcdat.dataset.open_dataset()
  2. Open up a variable using xcdat.variable.open_variable() (e.g., "ts")
  3. Check bounds accessor exists (ts.bounds.bounds) -- exists
  4. Perform an operation on the variable that returns a new object (e.g., drop a coordinate, copy the variable, etc.)
  5. Check bounds accessor exists (ts.bounds.bounds) -- does not exist anymore

What happens? Any logs, error output, etc?

Accessor attributes that are set will get dropped by any xarray DataArray methods that return new objects. Many of the xarray methods are not in-place operations, but instead return new objects. This will be a major problem with data integrity as users manipulate variables (bounds will be dropped).

>>> tas = ds.tas.copy()

# Copy bounds from parent dataset
>>> tas = tas.bounds._copy_from_parent(ds)

# Access bounds through accessor class attr
>>> tas.bounds.lat
Returns lat bounds

# Make a copy of the variable. This returns a new object.
>>> tas2 = tas.copy()

# Bounds are lost
>>> tas2.bounds.lat
None

# Must invoke again on the copy
>>> tas2 = tas2.bounds._copy_from_parent(ds)

What were you expecting to happen?

I would hope that the attributes would remain cached, but it is not persistent.

Accessor classes are great for extending Datasets and DataArrays ONLY IF the class methods and attributes are not dependent on other objects. For example, computed properties using the object's existing metadata.

Any other comments?

Attempted Fixes

  1. Store bounds in the DataArray as coordinates 2D matrix -- this requires adding a bnds/bounds dimension, which changes the shape of the original data (shared dims)
  2. Store bounds in the DataArray as 1D matrix to avoid needing a bounds dimension -- this is a hacky workaround that changes the shapes of the bounds. If bounds exist, they are loaded as 2D-matrixes.
# 2D matrix by default
>>> print(lat_bnds)
<xarray.DataArray 'lat_bnds' (lat: 145, bnds: 2)>
array([[-90.   , -89.375],
       [-89.375, -88.125],
       [-88.125, -86.875],
       ...,
       [ 86.875,  88.125],
       [ 88.125,  89.375],
       [ 89.375,  90.   ]])
Coordinates:
  * lat      (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
Dimensions without coordinates: bnds

# 1D matrix workaround
>>> print(da.x_bounds)
<xarray.DataArray 'x_bounds' (x: 3)>
array([(0., 1.), (1., 2.), (2., 3.)],
      dtype=[('start', '<f8'), ('stop', '<f8')])
Coordinates:
  * x         (x) float64 0.5 1.5 2.5
    x_bounds  (x) [('start', '<f8'), ('stop', '<f8')] (0., 1.) (1., 2.) (2., 3.)

Proposed Solutions

Work on the Dataset level

Pros:

  • This avoids all of the implementation headaches with trying to create a TransientVariable-like object in xarray (variable with bounds) due to the limitation of the DataArray data structure.
  • Bounds are attached at the Dataset level

Cons:

  • Minor inconvenience with working with each variable (ds.ts vs. just ts) --
    • users need to learn xarray API as a pre-req anyways
    • Just extract the variable after performing the operation on the dataset
@tomvothecoder tomvothecoder added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Aug 31, 2021
@tomvothecoder tomvothecoder changed the title DataArray accessor bounds attributes get dropped for all xarray functions that return new DataArrays DataArray accessor bounds attributes don't persist for all xarray functions that return new DataArrays Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant