Feature/46 spatial averaging #87

pochedls · 2021-07-22T21:40:42Z

Description

Closes Add grid-aware geospatial averaging (weighted) #46

How Has This Been Tested?

Local Pre-commit Checks
CI/CD Build

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

pochedls · 2021-07-22T21:43:40Z

xcdat/averager.py

+        inds = np.where(lon > 180)[0]
+        lon[inds] = lon[inds] - 360
+    # switch back to original data type
+    # REVIEW: Is there an easier way to be flexible to the datatype provided?


There must be a better way to do this - right?

Here's going to 180: https://stackoverflow.com/a/53561230
xarray also provides the same formula: https://xarray.pydata.org/en/stable/generated/xarray.DataArray.assign_coords.html

da.coords['lon'] = (da.coords['lon'] + 180) % 360 - 180 da = da.sortby(da.lon)

You can do something similar for going to 360.

The benefit of using this formula is that you stick with xarray and don't need to filter using numpy's .where().

Notes:

Generally, it is better to work directly on the xarray.DataArray instead of converting it into a numpy.ndarray. You maintain the attributes of the DataArray and don't need to perform type conversion.

If xarray does not support an operation, you can still perform numpy.ndarray operations directly on the xarray.DataArray due to the interoperability. No need to go back in forth in most cases, DataArray.data is a numpy.ndarray.

My refactored function:

Pass the entire xarray DataArray instead of just the lon coordinate.

The mid-point is -180, not 180 like the original _switch_lon_bounds().

https://github.com/tomvothecoder/xcdat/blob/911419a8580688cbdad4a362d72e6e804b387272/xcdat/average_tv.py#L18-L77

pochedls · 2021-07-22T21:44:09Z

xcdat/averager.py

+    # if invalid, throw error, else continue (with weighted logic)
+    # REVIEW: Should we throw a different exception type?
+    if weight not in ['weighted', 'unweighted']:
+        raise ValueError('Incorrect weight option. Choose either "weighted"' +


What exception type should this be?

ValueError is correct

codecov · 2021-07-22T21:44:34Z

Codecov Report

Merging #87 (7fd43c0) into main (533d150) will decrease coverage by 39.85%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##              main      #87       +/-   ##
============================================
- Coverage   100.00%   60.14%   -39.86%     
============================================
  Files            5        6        +1     
  Lines          163      271      +108     
============================================
  Hits           163      163               
- Misses           0      108      +108

Impacted Files	Coverage Δ
xcdat/averager.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 533d150...7fd43c0. Read the comment docs.

tomvothecoder · 2021-07-22T23:56:04Z

Not sure what happened, but the main branch reverted back to an old commit. Luckily my local main was up to date so I force pushed back the changes.

Another odd thing is that your PR is showing some old commits and there's a large diff with main, even though main has the same changes. I'll see if I can fix this.

pochedls · 2021-07-23T00:04:05Z

Not sure what happened, but the main branch reverted back to an old commit. Luckily my local main was up to date so I force pushed back the changes.

Another odd thing is that your PR is showing many old commits and there's a large diff with main, even though main has the same changes. I'll see if I can fix this.

Sorry about the trouble here. I think this was because I initially did git push -f instead of git push -f origin feature/46-spatial-averaging. I thought I needed the -f flag from this thread.

tomvothecoder · 2021-07-23T00:07:58Z

@pochedls Ahh, no worries. Just turned off force push capabilities on main. Let me try switching branches back and forth to see if the commit list and diff update.

EDIT: I changed the source branch to feature/80-open-dataset so those lists are a lot smaller now. We can change it back to main after feature/80-open-dataset is merged..

pochedls · 2021-07-23T18:41:11Z

xcdat/averager.py

+
+    """
+    # get masked data
+    masked = np.isnan(xvariable)


This step is a huge hit on performance. It can be skipped if there are no missing values. xarray may have built-in functionality to flag this (I think for some arrays xvariable.mask will return an array or a boolean, but I'm not sure how bullet-proof this is). I also think it is possible to include a masked array with a dataarray, so it may be necessary to include logic to check for (and work on) masked arrays, too.

You can extract a masked array using xarray.DataArray.isnull, which fills missing values with np.nan. Not sure how much faster that is than np.isnan, but at least it maintains the DataArray object structure instead of converting it into a raw np.ndarray.

If we need to replace np.nan with another value, there is xarray.DataArray.fillna.

I also think it is possible to include a masked array with a dataarray, so it may be necessary to include logic to check for (and work on) masked arrays, too.

The masked DataArray can only be stored in the original DataArray's coordinates, which I don't think makes sense.

Once issue with storing the mask is that it may not be up to date with the actual data. If the data is updated, the masked array needs to be updated too.

A possible solution is creating an accessor that dynamically generates the masked array attribute. It is probably better for the user to generate a separate mask anytime it is needed.

tomvothecoder · 2021-08-10T23:26:46Z

xcdat/averager.py

+# Should remove these notes and outline...
+
+
+def _validate_region_lon_bounds(grid_lon_bounds, region_lon_bounds):


Here's a refactored version that uses xarray syntax (pretty much the same as numpy)

https://github.com/tomvothecoder/xcdat/blob/1471ac8b4881379de37f9ea964596df708a745b8/xcdat/average_tv.py#L79-L133

xcdat/averager.py

tomvothecoder · 2021-08-11T15:57:35Z

xcdat/averager.py

+            ntile.append(1)
+    weights = np.tile(weights, ntile)
+    # zero out missing data
+    weights[masked] = 0.


Can be replaced with xarray.DataArray.fillna

tomvothecoder · 2021-08-11T15:58:52Z

xcdat/averager.py

+            # consistent with the dataarray axes
+            if not _validate_region_lon_bounds(xvariable.bounds.lon,
+                                               lon_bounds):
+                # REVIEW: Should we throw a more specific error?


ValueError is correct. We can create our own custom xcdat errors if it provides value, but that's a low priority if the error messages themselves are clear.

tomvothecoder · 2021-08-11T16:01:38Z

xcdat/averager.py

+    # just check a valid weight option is provided
+    # if invalid, throw error, else continue (with weighted logic)
+    # REVIEW: Should we throw a different exception type?
+    if weight not in ['weighted', 'unweighted']:
+        raise ValueError('Incorrect weight option. Choose either "weighted"' +


Input validation should be at the top of the function.

- Add dataset.py and variable.py which stores wrappers - Update .gitignore

tomvothecoder · 2021-09-20T22:43:55Z

I am closing this PR in favor of #95, which has been squashed to a single commit and rebased on main.

pochedls requested a review from tomvothecoder July 22, 2021 21:42

pochedls commented Jul 22, 2021

View reviewed changes

tomvothecoder changed the base branch from main to feature/80-open-dataset July 23, 2021 00:08

pochedls commented Jul 23, 2021

View reviewed changes

tomvothecoder force-pushed the feature/80-open-dataset branch 2 times, most recently from dce6966 to 0f5744c Compare July 23, 2021 21:34

tomvothecoder force-pushed the feature/46-spatial-averaging branch from 0db5ce9 to f9af9e8 Compare July 23, 2021 21:42

tomvothecoder assigned pochedls and tomvothecoder Aug 3, 2021

tomvothecoder added Priority: High type: enhancement New enhancement request labels Aug 3, 2021

tomvothecoder added this to the v1.0.0 milestone Aug 3, 2021

tomvothecoder force-pushed the feature/80-open-dataset branch 3 times, most recently from c6ab44d to b076b44 Compare August 4, 2021 17:47

tomvothecoder force-pushed the feature/46-spatial-averaging branch 2 times, most recently from 8dc2ae3 to 2af7c93 Compare August 4, 2021 20:31

Base automatically changed from feature/80-open-dataset to main August 10, 2021 17:24

tomvothecoder force-pushed the feature/46-spatial-averaging branch from 2af7c93 to 18efaa8 Compare August 10, 2021 17:26

tomvothecoder reviewed Aug 10, 2021

View reviewed changes

xcdat/averager.py Show resolved Hide resolved

tomvothecoder reviewed Aug 11, 2021

View reviewed changes

tomvothecoder force-pushed the feature/46-spatial-averaging branch 2 times, most recently from 042059f to 9a22965 Compare August 18, 2021 18:06

tomvothecoder mentioned this pull request Aug 25, 2021

Add ._copy method and _bounds attr to DataArrayBoundsAccessor #98

Merged

9 tasks

tomvothecoder and others added 4 commits August 25, 2021 15:04

Add dataset and data variable wrappers for common operations

5d155e4

- Add dataset.py and variable.py which stores wrappers - Update .gitignore

Update decorator refs from axis to coord

65bda6c

Initial work on Issue #46

87cd4cd

Fix merge conflict

7fd43c0

tomvothecoder force-pushed the feature/46-spatial-averaging branch from 9a22965 to 7fd43c0 Compare August 25, 2021 22:04

tomvothecoder mentioned this pull request Sep 1, 2021

Purge all DataArray bounds/variable related code #100

Merged

9 tasks

tomvothecoder closed this Sep 20, 2021

tomvothecoder mentioned this pull request Sep 21, 2021

Add geospatial averaging #95

Merged

9 tasks

tomvothecoder deleted the feature/46-spatial-averaging branch October 7, 2021 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/46 spatial averaging #87

Feature/46 spatial averaging #87

pochedls commented Jul 22, 2021

pochedls Jul 22, 2021

tomvothecoder Jul 23, 2021 •

edited

Loading

tomvothecoder Aug 10, 2021 •

edited

Loading

pochedls Jul 22, 2021

tomvothecoder Jul 23, 2021

codecov bot commented Jul 22, 2021 •

edited

Loading

tomvothecoder commented Jul 22, 2021 •

edited

Loading

pochedls commented Jul 23, 2021

tomvothecoder commented Jul 23, 2021 •

edited

Loading

pochedls Jul 23, 2021

tomvothecoder Aug 10, 2021 •

edited

Loading

tomvothecoder Aug 10, 2021

tomvothecoder Aug 11, 2021 •

edited

Loading

tomvothecoder Aug 11, 2021 •

edited

Loading

tomvothecoder Aug 11, 2021

tomvothecoder commented Sep 20, 2021

		# Should remove these notes and outline...


		def _validate_region_lon_bounds(grid_lon_bounds, region_lon_bounds):

Feature/46 spatial averaging #87

Feature/46 spatial averaging #87

Conversation

pochedls commented Jul 22, 2021

Description

How Has This Been Tested?

Checklist

pochedls Jul 22, 2021

Choose a reason for hiding this comment

tomvothecoder Jul 23, 2021 • edited Loading

Choose a reason for hiding this comment

tomvothecoder Aug 10, 2021 • edited Loading

Choose a reason for hiding this comment

pochedls Jul 22, 2021

Choose a reason for hiding this comment

tomvothecoder Jul 23, 2021

Choose a reason for hiding this comment

codecov bot commented Jul 22, 2021 • edited Loading

Codecov Report

tomvothecoder commented Jul 22, 2021 • edited Loading

pochedls commented Jul 23, 2021

tomvothecoder commented Jul 23, 2021 • edited Loading

pochedls Jul 23, 2021

Choose a reason for hiding this comment

tomvothecoder Aug 10, 2021 • edited Loading

Choose a reason for hiding this comment

tomvothecoder Aug 10, 2021

Choose a reason for hiding this comment

tomvothecoder Aug 11, 2021 • edited Loading

Choose a reason for hiding this comment

tomvothecoder Aug 11, 2021 • edited Loading

Choose a reason for hiding this comment

tomvothecoder Aug 11, 2021

Choose a reason for hiding this comment

tomvothecoder commented Sep 20, 2021

tomvothecoder Jul 23, 2021 •

edited

Loading

tomvothecoder Aug 10, 2021 •

edited

Loading

codecov bot commented Jul 22, 2021 •

edited

Loading

tomvothecoder commented Jul 22, 2021 •

edited

Loading

tomvothecoder commented Jul 23, 2021 •

edited

Loading

tomvothecoder Aug 10, 2021 •

edited

Loading

tomvothecoder Aug 11, 2021 •

edited

Loading

tomvothecoder Aug 11, 2021 •

edited

Loading