-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix spatial bugs: handle datasets with domain bounds out of order and zonal averaging #340
Conversation
Codecov ReportBase: 100.00% // Head: 100.00% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #340 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 1201 1200 -1
=========================================
- Hits 1201 1200 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
xcdat/spatial.py
Outdated
"Bounds", | ||
{"domain": xr.DataArray, "region": Optional[RegionAxisBounds]}, | ||
) | ||
axis_bounds: Dict[SpatialAxis, Bounds] = { | ||
"X": { | ||
axis_bounds: Dict[SpatialAxis, Bounds] = {} | ||
if "X" in axis: | ||
axis_bounds["X"] = { | ||
"domain": self._dataset.bounds.get_bounds("X").copy(), | ||
"region": lon_bounds, | ||
}, | ||
"Y": { | ||
} | ||
if "Y" in axis: | ||
axis_bounds["Y"] = { | ||
"domain": self._dataset.bounds.get_bounds("Y").copy(), | ||
"region": lat_bounds, | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the axis_bounds
dict tried to populate X
and Y
every time, but we do not need to populate the bounds information if we are not performing spatial averaging over the X
or Y
axis. This fixes #324. I think this was a bug and I didn't see a need to modify the unit tests in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good. I did some minor refactoring and will push the code for your review.
def test_bounds_reordered_when_upper_indexed_first(self): | ||
domain_bounds = xr.DataArray( | ||
name="lon_bnds", | ||
data=np.array( | ||
[[-89.375, -90], [0.0, -89.375], [0.0, 89.375], [89.375, 90]] | ||
), | ||
coords={"lat": self.ds.lat}, | ||
dims=["lat", "bnds"], | ||
) | ||
result = self.ds.spatial._force_domain_order_low_to_high(domain_bounds) | ||
|
||
with pytest.raises(ValueError): | ||
ds.spatial.get_weights(axis=["X"]) | ||
expected_domain_bounds = xr.DataArray( | ||
name="lon_bnds", | ||
data=np.array( | ||
[[-90, -89.375], [-89.375, 0.0], [0.0, 89.375], [89.375, 90]] | ||
), | ||
coords={"lat": self.ds.lat}, | ||
dims=["lat", "bnds"], | ||
) | ||
assert result.identical(expected_domain_bounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we threw an error if the lower domain bound was larger than the upper domain bound. Now we re-order them. This unit test checks to make sure that is done in _force_domain_order_low_to_high
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
xcdat/spatial.py
Outdated
def _validate_domain_bounds(self, domain_bounds: xr.DataArray): | ||
"""Validates the ``domain_bounds`` arg based on a set of criteria. | ||
|
||
def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray): | ||
"""Re-orders the ``domain_bounds`` to be ordered low-to-high such that | ||
domain_bounds[:, 0] < domain_bounds[:, 1]. | ||
Parameters | ||
---------- | ||
domain_bounds: xr.DataArray | ||
The bounds of an axis | ||
|
||
Raises | ||
The bounds of an axis. | ||
Returns | ||
------ | ||
TypeError | ||
If the ``domain_bounds`` of a grid cell are not ordered low-to-high. | ||
xr.DataArray | ||
The bounds of an axis (re-ordered if applicable). | ||
""" | ||
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0] | ||
if len(index_bad_cells) > 0: | ||
raise ValueError( | ||
"The bounds have unexpected ordering. A lower bound has a " | ||
"greater value than an upper bound." | ||
) | ||
new_domain_bounds = domain_bounds.copy() | ||
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1] | ||
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0] | ||
return new_domain_bounds | ||
else: | ||
return domain_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we threw an error if the lower domain bound was larger than the upper domain bound (in _validate_domain_bounds
) This function is replaced with _force_domain_order_low_to_high
which re-orders the bounds such that domain_bound[:, 0] < domain_bounds[:, 1]
.
Note I had originally replaced these lines with logic to simply determine, which of the bounds is higher versus lower (e.g., domain_lowers = np.min(d_bounds, axis=1)
), but the order matters further down (here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
if len(bounds) <= 0 or len(bounds) > 2: | ||
if len(bounds) != 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I randomly caught this... I think <=0
doesn't really make any sense, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change, and <=0
does not make sense (arrays can't have < 0 elements)
I think this is in good shape and can be reviewed. I do plan on re-running it over a bunch of cases before merging. |
I re-did comparisons with CDAT - we still agree with CDAT in all cases where both can load the dataset (loop over 115 model files and 21 domains). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, with some minor refactoring changes in my upcoming commit.
if len(bounds) <= 0 or len(bounds) > 2: | ||
if len(bounds) != 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change, and <=0
does not make sense (arrays can't have < 0 elements)
def test_bounds_reordered_when_upper_indexed_first(self): | ||
domain_bounds = xr.DataArray( | ||
name="lon_bnds", | ||
data=np.array( | ||
[[-89.375, -90], [0.0, -89.375], [0.0, 89.375], [89.375, 90]] | ||
), | ||
coords={"lat": self.ds.lat}, | ||
dims=["lat", "bnds"], | ||
) | ||
result = self.ds.spatial._force_domain_order_low_to_high(domain_bounds) | ||
|
||
with pytest.raises(ValueError): | ||
ds.spatial.get_weights(axis=["X"]) | ||
expected_domain_bounds = xr.DataArray( | ||
name="lon_bnds", | ||
data=np.array( | ||
[[-90, -89.375], [-89.375, 0.0], [0.0, 89.375], [89.375, 90]] | ||
), | ||
coords={"lat": self.ds.lat}, | ||
dims=["lat", "bnds"], | ||
) | ||
assert result.identical(expected_domain_bounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
xcdat/spatial.py
Outdated
"Bounds", | ||
{"domain": xr.DataArray, "region": Optional[RegionAxisBounds]}, | ||
) | ||
axis_bounds: Dict[SpatialAxis, Bounds] = { | ||
"X": { | ||
axis_bounds: Dict[SpatialAxis, Bounds] = {} | ||
if "X" in axis: | ||
axis_bounds["X"] = { | ||
"domain": self._dataset.bounds.get_bounds("X").copy(), | ||
"region": lon_bounds, | ||
}, | ||
"Y": { | ||
} | ||
if "Y" in axis: | ||
axis_bounds["Y"] = { | ||
"domain": self._dataset.bounds.get_bounds("Y").copy(), | ||
"region": lat_bounds, | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good. I did some minor refactoring and will push the code for your review.
xcdat/spatial.py
Outdated
def _validate_domain_bounds(self, domain_bounds: xr.DataArray): | ||
"""Validates the ``domain_bounds`` arg based on a set of criteria. | ||
|
||
def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray): | ||
"""Re-orders the ``domain_bounds`` to be ordered low-to-high such that | ||
domain_bounds[:, 0] < domain_bounds[:, 1]. | ||
Parameters | ||
---------- | ||
domain_bounds: xr.DataArray | ||
The bounds of an axis | ||
|
||
Raises | ||
The bounds of an axis. | ||
Returns | ||
------ | ||
TypeError | ||
If the ``domain_bounds`` of a grid cell are not ordered low-to-high. | ||
xr.DataArray | ||
The bounds of an axis (re-ordered if applicable). | ||
""" | ||
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0] | ||
if len(index_bad_cells) > 0: | ||
raise ValueError( | ||
"The bounds have unexpected ordering. A lower bound has a " | ||
"greater value than an upper bound." | ||
) | ||
new_domain_bounds = domain_bounds.copy() | ||
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1] | ||
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0] | ||
return new_domain_bounds | ||
else: | ||
return domain_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pochedls, please review my refactoring changes. The unit tests still pass.
If you agree with them, merge when ready.
|
||
weights = axis_bounds[key]["weights_method"](d_bounds, r_bounds) | ||
weights.attrs = d_bounds.attrs | ||
axis_weights[key] = weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored a bit here to reduce if else
statements and avoid fetching domain_bounds
for unspecified keys in axis
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commented was intended to highlight the entire method, not just this line.
xcdat/spatial.py
Outdated
def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray): | ||
"""Re-orders the ``domain_bounds`` to be ordered low-to-high such that | ||
domain_bounds[:, 0] < domain_bounds[:, 1]. | ||
"""Reorders the ``domain_bounds`` low-to-high. | ||
|
||
This method ensures all lower bound values are less than the upper bound | ||
values (``domain_bounds[:, 1] < domain_bounds[:, 1]``). | ||
|
||
Parameters | ||
---------- | ||
domain_bounds: xr.DataArray | ||
The bounds of an axis. | ||
|
||
Returns | ||
------ | ||
xr.DataArray | ||
The bounds of an axis (re-ordered if applicable). | ||
""" | ||
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0] | ||
|
||
if len(index_bad_cells) > 0: | ||
new_domain_bounds = domain_bounds.copy() | ||
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1] | ||
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0] | ||
|
||
return new_domain_bounds | ||
else: | ||
return domain_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated docstring and code spacing for consistency with code style
"weights_method": self._get_latitude_weights, | ||
"region": np.array(lat_bounds, dtype="float") | ||
if lat_bounds is not None | ||
else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You learn something new every day (that if
can come after the statement
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, they are called ternary operators (aka inline conditionals).
https://datagy.io/inline-if-in-python/
I don't use them too often for variable assignment, but it can be useful where it makes sense (readable and simple).
Description
This PR is intended to fix #324 and #338. It allows users to perform spatial averaging on datasets in which
domain_bounds[:, 0] > domain_bounds[:, 1]
by re-ordering the domain bounds for the purposes of calculating spatial weights. It also fixes an issue that prevented spatial averaging on lat-plev grids. A section of the code tried to load longitude bounds even when that axis was not specified for averaging; this section of code is updated to only load bounds information if the axis is specified for averaging.Checklist
If applicable: