-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/46 spatial averaging #87
Conversation
inds = np.where(lon > 180)[0] | ||
lon[inds] = lon[inds] - 360 | ||
# switch back to original data type | ||
# REVIEW: Is there an easier way to be flexible to the datatype provided? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be a better way to do this - right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's going to 180: https://stackoverflow.com/a/53561230
xarray also provides the same formula: https://xarray.pydata.org/en/stable/generated/xarray.DataArray.assign_coords.html
da.coords['lon'] = (da.coords['lon'] + 180) % 360 - 180
da = da.sortby(da.lon)
You can do something similar for going to 360.
The benefit of using this formula is that you stick with xarray and don't need to filter using numpy's .where()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
- Generally, it is better to work directly on the
xarray.DataArray
instead of converting it into anumpy.ndarray
. You maintain the attributes of theDataArray
and don't need to perform type conversion. - If xarray does not support an operation, you can still perform
numpy.ndarray
operations directly on thexarray.DataArray
due to the interoperability. No need to go back in forth in most cases,DataArray.data
is anumpy.ndarray
.
My refactored function:
- Pass the entire xarray DataArray instead of just the
lon
coordinate. - The mid-point is -180, not 180 like the original
_switch_lon_bounds()
.
# if invalid, throw error, else continue (with weighted logic) | ||
# REVIEW: Should we throw a different exception type? | ||
if weight not in ['weighted', 'unweighted']: | ||
raise ValueError('Incorrect weight option. Choose either "weighted"' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exception type should this be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError
is correct
Codecov Report
@@ Coverage Diff @@
## main #87 +/- ##
============================================
- Coverage 100.00% 60.14% -39.86%
============================================
Files 5 6 +1
Lines 163 271 +108
============================================
Hits 163 163
- Misses 0 108 +108
Continue to review full report at Codecov.
|
Not sure what happened, but the Another odd thing is that your PR is showing some old commits and there's a large diff with |
Sorry about the trouble here. I think this was because I initially did |
@pochedls Ahh, no worries. Just turned off force push capabilities on EDIT: I changed the source branch to |
|
||
""" | ||
# get masked data | ||
masked = np.isnan(xvariable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is a huge hit on performance. It can be skipped if there are no missing values. xarray may have built-in functionality to flag this (I think for some arrays xvariable.mask
will return an array or a boolean, but I'm not sure how bullet-proof this is). I also think it is possible to include a masked array with a dataarray, so it may be necessary to include logic to check for (and work on) masked arrays, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extract a masked array using xarray.DataArray.isnull, which fills missing values with np.nan
. Not sure how much faster that is than np.isnan
, but at least it maintains the DataArray
object structure instead of converting it into a raw np.ndarray
.
If we need to replace np.nan
with another value, there is xarray.DataArray.fillna.
I also think it is possible to include a masked array with a dataarray, so it may be necessary to include logic to check for (and work on) masked arrays, too.
The masked DataArray
can only be stored in the original DataArray's coordinates, which I don't think makes sense.
Once issue with storing the mask is that it may not be up to date with the actual data. If the data is updated, the masked array needs to be updated too.
A possible solution is creating an accessor that dynamically generates the masked array attribute. It is probably better for the user to generate a separate mask anytime it is needed.
dce6966
to
0f5744c
Compare
0db5ce9
to
f9af9e8
Compare
c6ab44d
to
b076b44
Compare
8dc2ae3
to
2af7c93
Compare
2af7c93
to
18efaa8
Compare
# Should remove these notes and outline... | ||
|
||
|
||
def _validate_region_lon_bounds(grid_lon_bounds, region_lon_bounds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a refactored version that uses xarray syntax (pretty much the same as numpy)
ntile.append(1) | ||
weights = np.tile(weights, ntile) | ||
# zero out missing data | ||
weights[masked] = 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be replaced with xarray.DataArray.fillna
# consistent with the dataarray axes | ||
if not _validate_region_lon_bounds(xvariable.bounds.lon, | ||
lon_bounds): | ||
# REVIEW: Should we throw a more specific error? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError is correct. We can create our own custom xcdat errors if it provides value, but that's a low priority if the error messages themselves are clear.
# just check a valid weight option is provided | ||
# if invalid, throw error, else continue (with weighted logic) | ||
# REVIEW: Should we throw a different exception type? | ||
if weight not in ['weighted', 'unweighted']: | ||
raise ValueError('Incorrect weight option. Choose either "weighted"' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input validation should be at the top of the function.
042059f
to
9a22965
Compare
- Add dataset.py and variable.py which stores wrappers - Update .gitignore
9a22965
to
7fd43c0
Compare
I am closing this PR in favor of #95, which has been squashed to a single commit and rebased on |
Description
How Has This Been Tested?
Checklist
If applicable: