Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smoothing #224

Merged
merged 13 commits into from
Sep 18, 2019
Merged

Smoothing #224

merged 13 commits into from
Sep 18, 2019

Conversation

aaronspring
Copy link
Collaborator

@aaronspring aaronspring commented Aug 6, 2019

Description

Add smoothing functions. Early PR but I need some input of you. I tried to make it a bit generalized, but currently its quite inconsistent.

Contains:

  • temporal smoothing
  • spatial smoothing based on xesmf
  • spatial smoothing based on coarsen
  • shortcut for Goddard smoothing

Still working on:

  • align args of smoothing function. now 1 takes tuple, 1 takes dict, other keyword. I propose to have all as dicts. Now unsure how to make the xesmf tuple boxsize into dict.
  • should we add many dependencies to enable CI for xesmf so should we test this only locally at home. currently I use try: import xesmf to aviod errors

To Do later:

  • ideal output: implement into classes. args decide the kind of smoothing PredictionEnsemble.smooth()

Fixes # https://github.com/bradyrx/climpred/issues/76

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • pytest

Checklist (while developing)

  • I have added docstrings to all new functions.
  • I have commented my code, particularly in hard-to-understand areas
  • Tests added for pytest, if necessary.
  • I have updated the sphinx documentation, if necessary.

Pre-Merge Checklist (final steps)

  • I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
  • I have squashed commits to a reasonable amount, and force-pushed the squashed commits.
  • I have run make html on the documents to make sure example notebooks still compile.

References

Goddard et al. 2013

climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
else:
was_dataset = False
ds_out = xe.util.grid_global(*boxsize)
regridder = xe.Regridder(ds, ds_out, 'bilinear')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a keyword for remap type. This could be 'bilinear', 'patch', etc. It will be hard to natively support conservative remapping since you need grid cell vertices for that.

Copy link
Member

@ahuang11 ahuang11 Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe make grid_global more flexible for future uses with non-global stuff.

Maybe you can adapt my code snippet

def _regrid_it(da, d_lon, d_lat, **kwargs):
    '''
    Global 2D rectilinear grid centers and bounds
     Parameters
    ----------
    da : xarray DataArray
        Contain input and output grid coordinates. Look for variables
        ``lon``, ``lat``, and optionally ``lon_b``, ``lat_b`` for
        conservative method.
         Shape can be 1D (Nlon,) and (Nlat,) for rectilinear grids,
        or 2D (Ny, Nx) for general curvilinear grids.
        Shape of bounds should be (N+1,) or (Ny+1, Nx+1).
     d_lon : float
        Longitude step size, i.e. grid resolution
     d_lat : float
        Latitude step size, i.e. grid resolution
     Returns
    -------
    da : xarray DataArray with coordinate values
     '''
    if not warn_lon_lat_dne(da):  # both lon and lat exist
        grid_out = {LON: np.arange(da[LON].min(), da[LON].max() + d_lon, d_lon),
                    LAT: np.arange(da[LAT].min(), da[LAT].max() + d_lat, d_lat)}
        regridder = xe.Regridder(da, grid_out, **kwargs)
        return regridder(da)
    else:
        return da


 def regrid_it(ds, d_lon=1, d_lat=None, method='bilinear',
              periodic=False, filename=None, reuse_weights=True):
    '''
    Quick regridding
     Parameters
    ----------
    ds : xarray DataSet
        Contain input and output grid coordinates. Look for variables
        ``lon``, ``lat``, and optionally ``lon_b``, ``lat_b`` for
        conservative method.
         Shape can be 1D (Nlon,) and (Nlat,) for rectilinear grids,
        or 2D (Ny, Nx) for general curvilinear grids.
        Shape of bounds should be (N+1,) or (Ny+1, Nx+1).
     d_lon : float, optional
        Longitude step size, i.e. grid resolution; if not provided,
        will equal 1
     d_lat : float, optional
        Latitude step size, i.e. grid resolution; if not provided,
        will equal d_lon
     method : str
        Regridding method. Options are
         - 'bilinear'
        - 'conservative', **need grid corner information**
        - 'patch'
        - 'nearest_s2d'
        - 'nearest_d2s'
     periodic : bool, optional
        Periodic in longitude? Default to False.
        Only useful for global grids with non-conservative regridding.
        Will be forced to False for conservative regridding.
     filename : str, optional
        Name for the weight file. The default naming scheme is::
             {method}_{Ny_in}x{Nx_in}_{Ny_out}x{Nx_out}.nc
         e.g. bilinear_400x600_300x400.nc
     reuse_weights : bool, optional
        Whether to read existing weight file to save computing time.
        False by default (i.e. re-compute, not reuse).
     Returns
    -------
    ds : xarray DataSet with coordinate values or DataArray
     '''
    if d_lat is None:
        d_lat = d_lon

     kwargs = {
        'd_lon': d_lon,
        'd_lat': d_lat,
        'method': method,
        'periodic':periodic,
        'filename': filename,
        'reuse_weights': reuse_weights,
    }

     if isinstance(ds, xr.Dataset):
        ds = xr.merge(_regrid_it(ds[var], **kwargs)
                      for var in ds.data_vars)
    else:
        ds = _regrid_it(ds, **kwargs)

     return ds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether we need to have this long function with documentation in our codebase. couldnt we just import this: from xesmf.utils import regrid_it. 1x1 deg is default here which is also ok for me. and then for the goddard wrapper I would adapt for 5x5. the difference is then that here it doesnt use dicts but d_lon/lat keywords with stepsize. temporal_smoothing can also be adapted for this. its either time or init. we dont allow for anything else and this the function can guess. any objections for this implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't import it because it doesn't exist in the codebase (the PR was rejected)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. so is this the only way to have not automatically remapping to global grid?

climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
@ahuang11
Copy link
Member

ahuang11 commented Aug 7, 2019

By the way, thanks for taking the initiative to make all these PRs!

@aaronspring
Copy link
Collaborator Author

Changes:

  • I made all functions have a dict as input. the unwrapping of the information inside the dict is sometimes to very clean. however, I liked to have a dict for all to have them all work with dicts. and with dicts I can provide the dimension name directly.
  • Andrews snippet used. however unsure whether my implementation of this is useful.
  • rm the print statement in assign_attrs and add a keyword in compute functions to write attrs or not (not for bootstrapping every iteration)
  • his is the best I could think of for now

climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
@bradyrx
Copy link
Collaborator

bradyrx commented Aug 7, 2019

I still have to pull this down and see how it works, but I'm really excited about this implementation. Could you clarify the difference between the xrcoarsen and the xesmf_coarsen @aaronspring ? Is it just whether we use the xarray vs. xesmf implementation? Do you envision that the filler coarsen function will allow a keyword for Goddard vs. other methods?

climpred/prediction.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/tests/test_smoothing.py Outdated Show resolved Hide resolved
@aaronspring
Copy link
Collaborator Author

I still have to pull this down and see how it works, but I'm really excited about this implementation. Could you clarify the difference between the xrcoarsen and the xesmf_coarsen @aaronspring ? Is it just whether we use the xarray vs. xesmf implementation?

xesmf is a regridding, so it regrids towards a target grid. xrcoarsening just takes a few dims and aggregates them, so it leaves the grid structure as is but makes it smaller. It makes a real difference for curvilinear grids.

Do you envision that the filler coarsen function will allow a keyword for Goddard vs. other methods?

I thought the other way around: Having a Goddard function that works as wrapper of temp. and spatial smoothing (should be xesmf 5x5).

@@ -91,6 +91,16 @@ def __init__(self, xobj):
def __repr__(self):
return _display_metadata(self)

def smooth(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tried to implement it. smooth_dict would need to change from init to time. I would like to try to implement this into the main class and not the subclass, because then we will have to do things twice. I actually wanted to loop over all objects in self. didnt work.

@coveralls
Copy link

coveralls commented Aug 21, 2019

Coverage Status

Coverage decreased (-0.4%) to 85.103% when pulling f3c328f on smoothing into 8f6d435 on master.

@aaronspring
Copy link
Collaborator Author

now when a user wants to compare DPLE hindcast to some observational data product, the user cannot add the obs if this has different grid information. however, it maybe be useful to be able to regrid hindcast and obs to 5x5 degree. maybe we have the checks later in the compute function and relax them while adding.

@aaronspring aaronspring requested a review from bradyrx August 22, 2019 14:30
@aaronspring
Copy link
Collaborator Author

I did a sketch and a few tests how the classes implementation could look like.

climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved

Args:
ds (xr.object): input.
smooth_dict (dict): length of smoothing of timesteps.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be an int that gets entered? Like smooth=3? Since we should only expect 'time' and 'lead' dimension names for our package we can infer which one it is. And I don't think we should have multiple entries for a given dataset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could yes. I wanted to have the inputs types aligned for all smoothing funcs. Especially in the classes. What I still aim for is a clever way that the contents of the dict are checked and then all the smoothing operations are applied.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have the args for all smoothing funcs consistent as dicts. but sure this could also be infered.

climpred/smoothing.py Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
if isinstance(smooth_dict, str):
if smooth_dict == 'goddard2013':
smooth_fct = smooth_goddard_2013
smooth_dict = {'lead': 4} # default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, need to figure out if this holds for non-annual averages. Maybe just have the user manually put this in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once temporal PR is ready and we know the requirements what needs to be coded there, this can be implemented

)
elif isinstance(smooth_dict, dict):
print(smooth_dict, type(smooth_dict))
if 'lon' in smooth_dict or 'lat' in smooth_dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this series of if statements. It only allows one thing to be done -- either spatial smoothing or temporal smoothing. Shouldn't the dictionary let both be done, e.g., if the user says smooth_dict={'lead': 4, 'lat': 5, 'lon': 5}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I still need to implement that all dict items are used to smooth.

climpred/classes.py Show resolved Hide resolved
raise ValueError(
'Please provide kwargs as str or dict and not', type(smooth_dict)
)
self.initialized = smooth_fct(self.initialized, smooth_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're trying to avoid inplace operations and rather use return (see https://github.com/bradyrx/climpred/issues/130)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s logical to first create the prediction ensemble and then smooth and have the same. This new inplace PR should then fix this consistently

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are inplace operations always to be avioded? for me it feels itutive if I have a PredictionEnsemble and I smooth it to have a smoothed PredictionEnsemble afterwards (inplace).
on the other side, once the xarray.isel and so an can be applied to PredictionEnsemble this smooth feature should behave the same way. lets save this for then.

@bradyrx
Copy link
Collaborator

bradyrx commented Aug 22, 2019

Left a ton of comments, but this looks really great so far @aaronspring. I'm excited to have this implemented. Some thoughts outside of my comments on the code:

  1. temporal_smoothing doesn't work on a non-prediction ensemble object right now. Perhaps it's not supposed to, since we're smoothing lead times into composite chunks. So we need to have some way to compare the composite chunks to a reference. Perhaps this is handled somehow in compute_hindcast and should be handled in another PR. But a smoothed 1-4 year lead time would be compared to all 1-4 lead averages on a reference.
  2. I'd prefer smooth_kws, coarsen_kws, and so on, which tends to be conventional in python. This would be instead of smooth_dict and so on.

Great work on this so far!

climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/smoothing.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
@aaronspring
Copy link
Collaborator Author

let me know if there are still requested changes. I dont see any for now to be addressed in this PR.

@bradyrx
Copy link
Collaborator

bradyrx commented Sep 4, 2019

@aaronspring , if you go to "Files Changed" up top you'll see there's still some comments from myself and @ahuang11 . A lot of mine are documentation stuff, but there are a few places where the if statements are set up improperly so it doesn't work as currently set up.

@aaronspring
Copy link
Collaborator Author

aaronspring commented Sep 5, 2019

sorry. I didnt see those. resolved them and commented on them now.

somehow now I get errors in test_xxx_prediction.py locally because I installed the new xskillscore version. CI hopefully doesnt have those because xskillscore is still in version 0.0.4

@bradyrx bradyrx merged commit 47db1ab into master Sep 18, 2019
@bradyrx
Copy link
Collaborator

bradyrx commented Sep 18, 2019

Thanks for the patience here @aaronspring! This is a great addition. I'll test/expand functionality with my PR on general apply functions.

@bradyrx bradyrx deleted the smoothing branch September 23, 2019 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants