Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary xarray methods to be applied to climpred objects #243

Merged
merged 9 commits into from
Nov 5, 2019

Conversation

bradyrx
Copy link
Collaborator

@bradyrx bradyrx commented Sep 29, 2019

Description

This PR allows arbitrary xarray functions to be applied to HindcastEnsemble and PerfectModelEnsemble objects. This is in the case that someone wants to assemble a climpred object and then apply various post-processing methods, such as .isel(), .sum(), etc. to our objects.

Fixes https://github.com/bradyrx/climpred/issues/102, https://github.com/bradyrx/climpred/issues/130

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

This was tested through example notebooks and updating existing docs notebooks

Checklist (while developing)

  • I have added docstrings to all new functions.
  • I have commented my code, particularly in hard-to-understand areas
  • Tests added for pytest, if necessary.
  • I have updated the sphinx documentation, if necessary.

Pre-Merge Checklist (final steps)

  • I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
  • I have squashed commits to a reasonable amount, and force-pushed the squashed commits.
  • I have run make html on the documents to make sure example notebooks still compile.

@bradyrx
Copy link
Collaborator Author

bradyrx commented Sep 29, 2019

So my first goal is to get this to work not inplace. I.e., the application of .isel() returns back a climpred class but does not modify it inplace. I'm having trouble getting this to work.

Notes:

  1. I removed the __repr__ magic for right now while I'm getting this to work, so you shouldn't be able to print out our object on playing with this.
  2. Similarly, I'm just working on getting the master class (PredictionEnsemble) to work first.

Issues:

  1. If applying an arbitrary func (e.g. .isel()) to a climpred object without reassigning it to a variable, you can't apply another func to the object afterwards (see below example)

TypeError: 'PredictionEnsemble' object is not callable

Any thoughts @ahuang11, @andersy005, @aaronspring? This is all pretty new to me so I'm having trouble getting this to work as desired.

working case:

import numpy as np
import xarray as xr
import climpred as cp

fake_dat = np.random.rand(30, 10,)
hind = xr.DataArray(fake_dat, dims=['init', 'lead',])
hind['init'] = np.arange(1950, 1980)
hind['lead'] = np.arange(10)
hind.name = 'SST'

# Only for the demo. Will remove the PredictionEnsemble import
# for the final version
hind = cp.PredictionEnsemble(hind)

# Original dataset
print(hind._datasets)
>>> {'initialized': <xarray.Dataset>
Dimensions:  (init: 30, lead: 10)
Coordinates:
  * init     (init) int64 1950 1951 1952 1953 1954 ... 1975 1976 1977 1978 1979
  * lead     (lead) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    SST      (init, lead) float64 0.7234 0.6834 0.6956 ... 0.2756 0.3057 0.3313}

# Select the first lead dimension
hind = hind.isel(lead=0)
print(hind._datasets)
>>> {'initialized': <xarray.Dataset>
Dimensions:  (init: 30)
Coordinates:
  * init     (init) int64 1950 1951 1952 1953 1954 ... 1975 1976 1977 1978 1979
    lead     int64 0
Data variables:
    SST      (init) float64 0.01507 0.01859 0.3588 ... 0.2327 0.02468 0.3506}

# Slice over init
hind = hind.isel(init=slice(0, 10))
print(hind._datasets)
>>> {'initialized': <xarray.Dataset>
Dimensions:  (init: 10)
Coordinates:
  * init     (init) int64 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959
    lead     int64 0
Data variables:
    SST      (init) float64 0.01507 0.01859 0.3588 ... 0.8931 0.002805 0.3095}

weird issue:

# Open up with same data production as above example.
print(hind._datasets)
>>> {'initialized': <xarray.Dataset>
Dimensions:  (init: 30, lead: 10)
Coordinates:
  * init     (init) int64 1950 1951 1952 1953 1954 ... 1975 1976 1977 1978 1979
  * lead     (lead) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    SST      (init, lead) float64 0.1705 0.8209 0.5546 ... 0.774 0.03419 0.1477}

# Cool, inplace doesn't work.
hind.isel(lead=0)
print(hind._datasets)
>>> {'initialized': <xarray.Dataset>
Dimensions:  (init: 30, lead: 10)
Coordinates:
  * init     (init) int64 1950 1951 1952 1953 1954 ... 1975 1976 1977 1978 1979
  * lead     (lead) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    SST      (init, lead) float64 0.1705 0.8209 0.5546 ... 0.774 0.03419 0.1477}

# Now try to do it the right way, and it breaks. Somehow doing it inplace modifies things?
hind = hind.isel(lead=0)
>>> TypeError: 'PredictionEnsemble' object is not callable

climpred/__init__.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
@aaronspring
Copy link
Collaborator

aaronspring commented Sep 30, 2019

add_control and add_reference are not updated to this it seems. But the syntax is what I would expect from such a PR. Nice!
hind.sel(lead=4). so now next step would be to have these xarray functions being applied to all items of the object.

ds = load_dataset('MPI-PM-DP-1D')['tos'].isel(period=-1, area=1)
control = load_dataset('MPI-control-1D')['tos'].isel(period=-1, area=1)

pm = climpred.PerfectModelEnsemble(ds)
pm.add_control(control)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-94c92430995c> in <module>
----> 1 pm.add_control(control)

~/Coding/climpred/climpred/checks.py in wrapper(*args, **kwargs)
     53         # to the actual function call rather than showing a simple Exception
     54         # (probably IndexError from trying to subselect an empty dec_args list)
---> 55         return func(*args, **kwargs)
     56 
     57     return wrapper

~/Coding/climpred/climpred/classes.py in add_control(self, xobj)
    261         if isinstance(xobj, xr.DataArray):
    262             xobj = xobj.to_dataset()
--> 263         match_initialized_dims(self.initialized, xobj)
    264         match_initialized_vars(self.initialized, xobj)
    265         self.control = xobj

~/Coding/climpred/climpred/checks.py in match_initialized_dims(init, ref, uninitialized)
    103     # temporarily rename to time.
    104     init = init.rename({'init': 'time'})
--> 105     init_dims = list(init.dims)
    106     if 'lead' in init_dims:
    107         init_dims.remove('lead')

TypeError: 'function' object is not iterable

@aaronspring
Copy link
Collaborator

Your example seems to work when not overwriting the object. I use a new name. Maybe you have to delete the object first: del

hind2 = hind.isel(lead=0)
hc2._datasets

Copy link
Collaborator

@aaronspring aaronspring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a general and smart solution. work for the initialized part. missing so far: add_control/reference

@aaronspring
Copy link
Collaborator

I get why xarray doesnt want inplace operations, but to add a reference to the object, this should be ok, isnt? See https://github.com/bradyrx/climpred/issues/130
Seems a bit annoying to me: pm = pm.add_control(control)

@bradyrx
Copy link
Collaborator Author

bradyrx commented Sep 30, 2019

add_control and add_reference are not updated to this it seems. But the syntax is what I would expect from such a PR. Nice!
hind.sel(lead=4). so now next step would be to have these xarray functions being applied to all items of the object.

Yep, these aren't expected to work in its current form. I wanted to solve the inplace issue before I apply it to all objects. But the proposed solution is instead of having attributes with dictionaries (e.g. hindcast.reference = {}, hindcast.uninitialized = {}), we'll just have one large nested dictionry called hindcast._datasets, which makes looping thru and list comprehension much easier.

Your example seems to work when not overwriting the object. I use a new name. Maybe you have to delete the object first: del

I get the same issue when not overwriting. The problem is that if you do it interactively first, it breaks. Which is a huge problem to me. I like to check stuff interactively before assigning them variable names, if that makes sense. I can't work out what in the code would cause this and stack exchange isn't helping (or I'm not searching the right thing)

Screen Shot 2019-09-30 at 9 04 51 AM

Seems a bit annoying to me: pm = pm.add_control(control)

Yeah I get it. It's a design decision, and we should be in line with xarray. I think it's better to be explicit (i.e. not doing it inplace) so that you can check stuff interactively (as I said above) but not permanently change your variable.

@bradyrx
Copy link
Collaborator Author

bradyrx commented Sep 30, 2019

Okay I fixed this issue. It has to do with using setattr. See my comments on code review...

Screen Shot 2019-09-30 at 11 14 59 AM

climpred/classes.py Outdated Show resolved Hide resolved
@bradyrx bradyrx removed the request for review from ahuang11 October 1, 2019 00:11
@coveralls
Copy link

coveralls commented Oct 2, 2019

Coverage Status

Coverage increased (+0.8%) to 85.565% when pulling 8b78874 on extend_xarray_funcs into d2e71b0 on master.

climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
@bradyrx bradyrx force-pushed the extend_xarray_funcs branch from b89926c to df27fe7 Compare October 23, 2019 02:54
@bradyrx
Copy link
Collaborator Author

bradyrx commented Oct 23, 2019

Okay I addressed the main comments here. I have two left to address that are just syntax. Need to prioritize documentation and testing for the next push.

climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
climpred/classes.py Show resolved Hide resolved
climpred/classes.py Outdated Show resolved Hide resolved
@bradyrx
Copy link
Collaborator Author

bradyrx commented Oct 25, 2019

Okay, we're getting close @ahuang11 @aaronspring . I just need to spend an evening writing tests to cover all of the classes module and also want to make a brief docs page on the two objects and point out that you can use xarray functions on them.

@bradyrx
Copy link
Collaborator Author

bradyrx commented Nov 5, 2019

I'm gonna go ahead and squash commits and merge this since you guys have reviewed a few times and most changes since then were just to docs. This PR is ballooning so big and needs to get in so I am going to address adding testing in another PR so we can keep that contained and will of course wait to make this a formal release until testing is added.

I will also open a bunch of small issues for things that need to be modified/added/updated for the classes.

@bradyrx bradyrx mentioned this pull request Nov 5, 2019
Riley Brady and others added 9 commits November 5, 2019 14:10
fix compute_metric for hindcast
refactor HindcastEnsemble and get all functions working

refactoring of PM class

fix some hindcast pytest errors

all tests pass on test hindcastEnsemble

all tests for PM object pass
finish refactoring PM with general apply function

non inplace working for all but add reference

fix hindcast ensemble testing

fix PM testing without inplace
add coverage to PM testing

rename is_initialized to has_dataset
refactor list comprehension
fix links in documentation

remove auto-generated api stuff

update API with classes information

add api instructions to how to contribute
finish up prediction ensemble docs
@bradyrx bradyrx force-pushed the extend_xarray_funcs branch from 74d9be7 to 8b78874 Compare November 5, 2019 19:14
@bradyrx bradyrx dismissed aaronspring’s stale review November 5, 2019 19:23

This is a stale review and comments were addressed

@bradyrx bradyrx merged commit e684449 into master Nov 5, 2019
@bradyrx
Copy link
Collaborator Author

bradyrx commented Nov 5, 2019

All tests pass, docs build with new features, etc. Decided to merge since we've done extensive review here and didn't want to drag it along too much further. Thanks @aaronspring and @ahuang11 for the comments here.

We now have generalized xarray funcs for all of our objects, a lot of additions to the docs, no more in-place operations, some additional testing. I'll open a new PR that fully covers the classes module for testing.

@bradyrx bradyrx deleted the extend_xarray_funcs branch November 5, 2019 19:25
@aaronspring
Copy link
Collaborator

Good job. We might need to adapt tiny changes in the future. But I hope users will find these classes and xr calls very conveniently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow basic xarray commands to be applied to whole PredictionEnsemble object
4 participants