Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spatial bugs: handle datasets with domain bounds out of order and zonal averaging #340

Merged
merged 5 commits into from
Sep 15, 2022

Conversation

pochedls
Copy link
Collaborator

@pochedls pochedls commented Sep 10, 2022

Description

This PR is intended to fix #324 and #338. It allows users to perform spatial averaging on datasets in which domain_bounds[:, 0] > domain_bounds[:, 1] by re-ordering the domain bounds for the purposes of calculating spatial weights. It also fixes an issue that prevented spatial averaging on lat-plev grids. A section of the code tried to load longitude bounds even when that axis was not specified for averaging; this section of code is updated to only load bounds information if the axis is specified for averaging.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2022

Codecov Report

Base: 100.00% // Head: 100.00% // No change to project coverage 👍

Coverage data is based on head (f4a99b3) compared to base (147b80b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #340   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        14           
  Lines         1201      1200    -1     
=========================================
- Hits          1201      1200    -1     
Impacted Files Coverage Δ
xcdat/spatial.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

xcdat/spatial.py Outdated
Comment on lines 247 to 260
"Bounds",
{"domain": xr.DataArray, "region": Optional[RegionAxisBounds]},
)
axis_bounds: Dict[SpatialAxis, Bounds] = {
"X": {
axis_bounds: Dict[SpatialAxis, Bounds] = {}
if "X" in axis:
axis_bounds["X"] = {
"domain": self._dataset.bounds.get_bounds("X").copy(),
"region": lon_bounds,
},
"Y": {
}
if "Y" in axis:
axis_bounds["Y"] = {
"domain": self._dataset.bounds.get_bounds("Y").copy(),
"region": lat_bounds,
},
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the axis_bounds dict tried to populate X and Y every time, but we do not need to populate the bounds information if we are not performing spatial averaging over the X or Y axis. This fixes #324. I think this was a bug and I didn't see a need to modify the unit tests in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good. I did some minor refactoring and will push the code for your review.

Comment on lines +257 to +276
def test_bounds_reordered_when_upper_indexed_first(self):
domain_bounds = xr.DataArray(
name="lon_bnds",
data=np.array(
[[-89.375, -90], [0.0, -89.375], [0.0, 89.375], [89.375, 90]]
),
coords={"lat": self.ds.lat},
dims=["lat", "bnds"],
)
result = self.ds.spatial._force_domain_order_low_to_high(domain_bounds)

with pytest.raises(ValueError):
ds.spatial.get_weights(axis=["X"])
expected_domain_bounds = xr.DataArray(
name="lon_bnds",
data=np.array(
[[-90, -89.375], [-89.375, 0.0], [0.0, 89.375], [89.375, 90]]
),
coords={"lat": self.ds.lat},
dims=["lat", "bnds"],
)
assert result.identical(expected_domain_bounds)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we threw an error if the lower domain bound was larger than the upper domain bound. Now we re-order them. This unit test checks to make sure that is done in _force_domain_order_low_to_high.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

xcdat/spatial.py Outdated
Comment on lines 312 to 332
def _validate_domain_bounds(self, domain_bounds: xr.DataArray):
"""Validates the ``domain_bounds`` arg based on a set of criteria.

def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray):
"""Re-orders the ``domain_bounds`` to be ordered low-to-high such that
domain_bounds[:, 0] < domain_bounds[:, 1].
Parameters
----------
domain_bounds: xr.DataArray
The bounds of an axis

Raises
The bounds of an axis.
Returns
------
TypeError
If the ``domain_bounds`` of a grid cell are not ordered low-to-high.
xr.DataArray
The bounds of an axis (re-ordered if applicable).
"""
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0]
if len(index_bad_cells) > 0:
raise ValueError(
"The bounds have unexpected ordering. A lower bound has a "
"greater value than an upper bound."
)
new_domain_bounds = domain_bounds.copy()
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1]
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0]
return new_domain_bounds
else:
return domain_bounds
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we threw an error if the lower domain bound was larger than the upper domain bound (in _validate_domain_bounds) This function is replaced with _force_domain_order_low_to_high which re-orders the bounds such that domain_bound[:, 0] < domain_bounds[:, 1].

Note I had originally replaced these lines with logic to simply determine, which of the bounds is higher versus lower (e.g., domain_lowers = np.min(d_bounds, axis=1)), but the order matters further down (here).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Comment on lines -362 to +364
if len(bounds) <= 0 or len(bounds) > 2:
if len(bounds) != 2:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I randomly caught this... I think <=0 doesn't really make any sense, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change, and <=0 does not make sense (arrays can't have < 0 elements)

@pochedls
Copy link
Collaborator Author

I think this is in good shape and can be reviewed. I do plan on re-running it over a bunch of cases before merging.

@tomvothecoder tomvothecoder added type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. Priority: Medium labels Sep 12, 2022
@pochedls
Copy link
Collaborator Author

I re-did comparisons with CDAT - we still agree with CDAT in all cases where both can load the dataset (loop over 115 model files and 21 domains).

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, with some minor refactoring changes in my upcoming commit.

Comment on lines -362 to +364
if len(bounds) <= 0 or len(bounds) > 2:
if len(bounds) != 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change, and <=0 does not make sense (arrays can't have < 0 elements)

Comment on lines +257 to +276
def test_bounds_reordered_when_upper_indexed_first(self):
domain_bounds = xr.DataArray(
name="lon_bnds",
data=np.array(
[[-89.375, -90], [0.0, -89.375], [0.0, 89.375], [89.375, 90]]
),
coords={"lat": self.ds.lat},
dims=["lat", "bnds"],
)
result = self.ds.spatial._force_domain_order_low_to_high(domain_bounds)

with pytest.raises(ValueError):
ds.spatial.get_weights(axis=["X"])
expected_domain_bounds = xr.DataArray(
name="lon_bnds",
data=np.array(
[[-90, -89.375], [-89.375, 0.0], [0.0, 89.375], [89.375, 90]]
),
coords={"lat": self.ds.lat},
dims=["lat", "bnds"],
)
assert result.identical(expected_domain_bounds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

xcdat/spatial.py Outdated
Comment on lines 247 to 260
"Bounds",
{"domain": xr.DataArray, "region": Optional[RegionAxisBounds]},
)
axis_bounds: Dict[SpatialAxis, Bounds] = {
"X": {
axis_bounds: Dict[SpatialAxis, Bounds] = {}
if "X" in axis:
axis_bounds["X"] = {
"domain": self._dataset.bounds.get_bounds("X").copy(),
"region": lon_bounds,
},
"Y": {
}
if "Y" in axis:
axis_bounds["Y"] = {
"domain": self._dataset.bounds.get_bounds("Y").copy(),
"region": lat_bounds,
},
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good. I did some minor refactoring and will push the code for your review.

xcdat/spatial.py Outdated
Comment on lines 312 to 332
def _validate_domain_bounds(self, domain_bounds: xr.DataArray):
"""Validates the ``domain_bounds`` arg based on a set of criteria.

def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray):
"""Re-orders the ``domain_bounds`` to be ordered low-to-high such that
domain_bounds[:, 0] < domain_bounds[:, 1].
Parameters
----------
domain_bounds: xr.DataArray
The bounds of an axis

Raises
The bounds of an axis.
Returns
------
TypeError
If the ``domain_bounds`` of a grid cell are not ordered low-to-high.
xr.DataArray
The bounds of an axis (re-ordered if applicable).
"""
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0]
if len(index_bad_cells) > 0:
raise ValueError(
"The bounds have unexpected ordering. A lower bound has a "
"greater value than an upper bound."
)
new_domain_bounds = domain_bounds.copy()
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1]
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0]
return new_domain_bounds
else:
return domain_bounds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pochedls, please review my refactoring changes. The unit tests still pass.

If you agree with them, merge when ready.


weights = axis_bounds[key]["weights_method"](d_bounds, r_bounds)
weights.attrs = d_bounds.attrs
axis_weights[key] = weights
Copy link
Collaborator

@tomvothecoder tomvothecoder Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored a bit here to reduce if else statements and avoid fetching domain_bounds for unspecified keys in axis.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented was intended to highlight the entire method, not just this line.

xcdat/spatial.py Outdated
Comment on lines 312 to 337
def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray):
"""Re-orders the ``domain_bounds`` to be ordered low-to-high such that
domain_bounds[:, 0] < domain_bounds[:, 1].
"""Reorders the ``domain_bounds`` low-to-high.

This method ensures all lower bound values are less than the upper bound
values (``domain_bounds[:, 1] < domain_bounds[:, 1]``).

Parameters
----------
domain_bounds: xr.DataArray
The bounds of an axis.

Returns
------
xr.DataArray
The bounds of an axis (re-ordered if applicable).
"""
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0]

if len(index_bad_cells) > 0:
new_domain_bounds = domain_bounds.copy()
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1]
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0]

return new_domain_bounds
else:
return domain_bounds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docstring and code spacing for consistency with code style

Comment on lines +259 to +262
"weights_method": self._get_latitude_weights,
"region": np.array(lat_bounds, dtype="float")
if lat_bounds is not None
else None,
Copy link
Collaborator Author

@pochedls pochedls Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You learn something new every day (that if can come after the statement).

Copy link
Collaborator

@tomvothecoder tomvothecoder Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, they are called ternary operators (aka inline conditionals).
https://datagy.io/inline-if-in-python/

I don't use them too often for variable assignment, but it can be useful where it makes sense (readable and simple).

xcdat/spatial.py Outdated Show resolved Hide resolved
@pochedls pochedls merged commit 2ea2f46 into main Sep 15, 2022
@pochedls pochedls deleted the bug/fix-spatial-domain-ordering branch September 15, 2022 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Feature]: Handle "out of order" bounds [Bug]: averaging a lat-plev grid fails
3 participants