Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/46 spatial averaging #87

Closed
wants to merge 4 commits into from
Closed

Conversation

pochedls
Copy link
Collaborator

Description

How Has This Been Tested?

  • Local Pre-commit Checks
  • CI/CD Build

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@pochedls pochedls requested a review from tomvothecoder July 22, 2021 21:42
inds = np.where(lon > 180)[0]
lon[inds] = lon[inds] - 360
# switch back to original data type
# REVIEW: Is there an easier way to be flexible to the datatype provided?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a better way to do this - right?

Copy link
Collaborator

@tomvothecoder tomvothecoder Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's going to 180: https://stackoverflow.com/a/53561230
xarray also provides the same formula: https://xarray.pydata.org/en/stable/generated/xarray.DataArray.assign_coords.html

da.coords['lon'] = (da.coords['lon'] + 180) % 360 - 180
da = da.sortby(da.lon)

You can do something similar for going to 360.

The benefit of using this formula is that you stick with xarray and don't need to filter using numpy's .where().

Copy link
Collaborator

@tomvothecoder tomvothecoder Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • Generally, it is better to work directly on the xarray.DataArray instead of converting it into a numpy.ndarray. You maintain the attributes of the DataArray and don't need to perform type conversion.
  • If xarray does not support an operation, you can still perform numpy.ndarray operations directly on the xarray.DataArray due to the interoperability. No need to go back in forth in most cases, DataArray.data is a numpy.ndarray.

My refactored function:

  • Pass the entire xarray DataArray instead of just the lon coordinate.
  • The mid-point is -180, not 180 like the original _switch_lon_bounds().

https://github.com/tomvothecoder/xcdat/blob/911419a8580688cbdad4a362d72e6e804b387272/xcdat/average_tv.py#L18-L77

# if invalid, throw error, else continue (with weighted logic)
# REVIEW: Should we throw a different exception type?
if weight not in ['weighted', 'unweighted']:
raise ValueError('Incorrect weight option. Choose either "weighted"' +
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception type should this be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError is correct

@codecov
Copy link

codecov bot commented Jul 22, 2021

Codecov Report

Merging #87 (7fd43c0) into main (533d150) will decrease coverage by 39.85%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##              main      #87       +/-   ##
============================================
- Coverage   100.00%   60.14%   -39.86%     
============================================
  Files            5        6        +1     
  Lines          163      271      +108     
============================================
  Hits           163      163               
- Misses           0      108      +108     
Impacted Files Coverage Δ
xcdat/averager.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 533d150...7fd43c0. Read the comment docs.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jul 22, 2021

Not sure what happened, but the main branch reverted back to an old commit. Luckily my local main was up to date so I force pushed back the changes.

Another odd thing is that your PR is showing some old commits and there's a large diff with main, even though main has the same changes. I'll see if I can fix this.

@pochedls
Copy link
Collaborator Author

Not sure what happened, but the main branch reverted back to an old commit. Luckily my local main was up to date so I force pushed back the changes.

Another odd thing is that your PR is showing many old commits and there's a large diff with main, even though main has the same changes. I'll see if I can fix this.

Sorry about the trouble here. I think this was because I initially did git push -f instead of git push -f origin feature/46-spatial-averaging. I thought I needed the -f flag from this thread.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jul 23, 2021

@pochedls Ahh, no worries. Just turned off force push capabilities on main. Let me try switching branches back and forth to see if the commit list and diff update.

EDIT: I changed the source branch to feature/80-open-dataset so those lists are a lot smaller now. We can change it back to main after feature/80-open-dataset is merged..

@tomvothecoder tomvothecoder changed the base branch from main to feature/80-open-dataset July 23, 2021 00:08

"""
# get masked data
masked = np.isnan(xvariable)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is a huge hit on performance. It can be skipped if there are no missing values. xarray may have built-in functionality to flag this (I think for some arrays xvariable.mask will return an array or a boolean, but I'm not sure how bullet-proof this is). I also think it is possible to include a masked array with a dataarray, so it may be necessary to include logic to check for (and work on) masked arrays, too.

Copy link
Collaborator

@tomvothecoder tomvothecoder Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can extract a masked array using xarray.DataArray.isnull, which fills missing values with np.nan. Not sure how much faster that is than np.isnan, but at least it maintains the DataArray object structure instead of converting it into a raw np.ndarray.

If we need to replace np.nan with another value, there is xarray.DataArray.fillna.

I also think it is possible to include a masked array with a dataarray, so it may be necessary to include logic to check for (and work on) masked arrays, too.

The masked DataArray can only be stored in the original DataArray's coordinates, which I don't think makes sense.

Once issue with storing the mask is that it may not be up to date with the actual data. If the data is updated, the masked array needs to be updated too.

A possible solution is creating an accessor that dynamically generates the masked array attribute. It is probably better for the user to generate a separate mask anytime it is needed.

@tomvothecoder tomvothecoder force-pushed the feature/80-open-dataset branch 2 times, most recently from dce6966 to 0f5744c Compare July 23, 2021 21:34
@tomvothecoder tomvothecoder force-pushed the feature/46-spatial-averaging branch from 0db5ce9 to f9af9e8 Compare July 23, 2021 21:42
@tomvothecoder tomvothecoder added this to the v1.0.0 milestone Aug 3, 2021
@tomvothecoder tomvothecoder force-pushed the feature/80-open-dataset branch 3 times, most recently from c6ab44d to b076b44 Compare August 4, 2021 17:47
@tomvothecoder tomvothecoder force-pushed the feature/46-spatial-averaging branch 2 times, most recently from 8dc2ae3 to 2af7c93 Compare August 4, 2021 20:31
Base automatically changed from feature/80-open-dataset to main August 10, 2021 17:24
@tomvothecoder tomvothecoder force-pushed the feature/46-spatial-averaging branch from 2af7c93 to 18efaa8 Compare August 10, 2021 17:26
# Should remove these notes and outline...


def _validate_region_lon_bounds(grid_lon_bounds, region_lon_bounds):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a refactored version that uses xarray syntax (pretty much the same as numpy)

https://github.com/tomvothecoder/xcdat/blob/1471ac8b4881379de37f9ea964596df708a745b8/xcdat/average_tv.py#L79-L133

ntile.append(1)
weights = np.tile(weights, ntile)
# zero out missing data
weights[masked] = 0.
Copy link
Collaborator

@tomvothecoder tomvothecoder Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be replaced with xarray.DataArray.fillna

# consistent with the dataarray axes
if not _validate_region_lon_bounds(xvariable.bounds.lon,
lon_bounds):
# REVIEW: Should we throw a more specific error?
Copy link
Collaborator

@tomvothecoder tomvothecoder Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError is correct. We can create our own custom xcdat errors if it provides value, but that's a low priority if the error messages themselves are clear.

Comment on lines +200 to +204
# just check a valid weight option is provided
# if invalid, throw error, else continue (with weighted logic)
# REVIEW: Should we throw a different exception type?
if weight not in ['weighted', 'unweighted']:
raise ValueError('Incorrect weight option. Choose either "weighted"' +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input validation should be at the top of the function.

@tomvothecoder tomvothecoder force-pushed the feature/46-spatial-averaging branch 2 times, most recently from 042059f to 9a22965 Compare August 18, 2021 18:06
@tomvothecoder
Copy link
Collaborator

I am closing this PR in favor of #95, which has been squashed to a single commit and rebased on main.

@tomvothecoder tomvothecoder mentioned this pull request Sep 21, 2021
9 tasks
@tomvothecoder tomvothecoder deleted the feature/46-spatial-averaging branch October 7, 2021 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add grid-aware geospatial averaging (weighted)
2 participants