-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding resample functionality to CFTimeIndex #2191
Comments
Yes, I think so. The main thing we need is a function to map from datetime -> datetime at start of frequency. |
I am trying to combine the monthly CMIP5 rcp85 ts datasets (go past 2064AD) with the myriad calendars, so I love the new CFTimeIndex! But I need resample(time='MS') in order to force them all to start on the first of each month |
@naomi-henderson thanks! In the meantime here's a possible workaround, in case you haven't figured one out already: import numpy as np
import xarray as xr
from cftime import num2date, DatetimeNoLeap
times = num2date(np.arange(730), calendar='noleap', units='days since 0001-01-01')
da = xr.DataArray(np.arange(730), coords=[times], dims=['time'])
month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time]
da['MS'] = xr.DataArray(month_start, coords=da.time.coords)
resampled = da.groupby('MS').mean('time').rename({'MS': 'time'}) |
@spencerkclark thanks! I hadn't figured out that particular workaround, but it works, albeit quite slow. For now it will get me to the next step, but just changing to first-of-the-month takes longer than regridding all models to a common grid! |
Indeed what I had above is quite slow! In [6]: %%timeit
...: month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time]
...:
1 loop, best of 3: 588 ms per loop Iterating over the contents of In [7]: %%timeit
...: month_start = [DatetimeNoLeap(date.year, date.month, 1) for date in da.time.values]
...:
1000 loops, best of 3: 302 µs per loop |
Yes, when open_mfdataset decides to convert to CFTime this is much faster. When time is in datetime64, I get:
You can see I made a feeble attempt to fix it to work for all the CMIP5 calendars, but is just as slow. Any suggestions? |
When the time coordinate contains I think the most general workaround for right now would probably look something like the example below. This has the property that it preserves the underlying calendar type of the time index. import pandas as pd
import xarray as xr
def resample_ms_freq(ds, dim='time'):
"""Resample the dataset to 'MS' frequency regardless of the
calendar used.
Parameters
----------
ds : Dataset
Dataset to be resampled
dim : str
Dimension name associated with the time index
Returns
-------
Dataset
"""
index = ds.indexes[dim]
if isinstance(index, pd.DatetimeIndex):
return ds.resample(**{dim: 'MS'}).mean(dim)
elif isinstance(index, xr.CFTimeIndex):
date_type = index.date_type
month_start = [date_type(date.year, date.month, 1) for date in ds[dim].values]
ms = xr.DataArray(month_start, coords=ds[dim].coords)
ds = ds.assign_coords(MS=ms)
return ds.groupby('MS').mean(dim).rename({'MS': dim})
else:
raise TypeError(
'Resampling to month start frequency requires using a time index of either '
'type pd.DatetimeIndex or xr.CFTimeIndex.')
with xr.set_options(enable_cftimeindex=True):
ds = xr.open_mfdataset(files)
resampled = resample_ms_freq(ds) |
I'm not sure if my issue belongs in here, but I didn't want to create a new Issue (there are already 455 open ones). I am experimenting with the new I am trying to da.time.get_index('time').shift(1,'D')
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-71-db48b2fbb340> in <module>()
----> 1 da.time.get_index('time').shift(1,'D')
/g/data3/hh5/public/apps/miniconda3/envs/analysis27-18.04/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in shift(self, periods, freq)
2627 """
2628 raise NotImplementedError("Not supported for type %s" %
-> 2629 type(self).__name__)
2630
2631 def argsort(self, *args, **kwargs):
NotImplementedError: Not supported for type CFTimeIndex Is this not implemented because it might require resampling? I ask because this works: times[0] + pd.Timedelta('365 days')
cftime.DatetimeNoLeap(2, 1, 1, 0, 0, 0, 0, -1, 1) I guess I am asking, if I want to shift a time index is the best (only?) way currently is to loop over all the individual elements of the index and add a time offset to each? |
shift() is different from resampling, but indeed it looks like we’ll need
to add it manually to CFTimeIndex.
…On Thu, Jun 21, 2018 at 9:12 PM Aidan Heerdegen ***@***.***> wrote:
I'm not sure if my issue belongs in here, but I didn't want to create a
new Issue (there are already 455 open ones).
I am experimenting with the new CFTimeIndex functionality (thanks heaps
BTW! That was a mammoth effort if the PR thread is anything to go by).
I am trying to shift a time index as I need to align datasets to a common
start point. So using the example code above,
da.time.get_index('time').shift(1,'D')---------------------------------------------------------------------------NotImplementedError Traceback (most recent call last)<ipython-input-71-db48b2fbb340> in <module>()----> 1 da.time.get_index('time').shift(1,'D')
/g/data3/hh5/public/apps/miniconda3/envs/analysis27-18.04/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in shift(self, periods, freq)
2627 """ 2628 raise NotImplementedError("Not supported for type %s" %-> 2629 type(self).__name__) 2630 2631 def argsort(self, *args, **kwargs):NotImplementedError: Not supported for type CFTimeIndex
Is this not implemented because it might require resampling?
I ask because this works:
times[0] + pd.Timedelta('365 days')
cftime.DatetimeNoLeap(2, 1, 1, 0, 0, 0, 0, -1, 1)```
I guess I am asking, if I want to shift a time index is the best (only?) way currently to loop over all the individual elements of the index and add a time offset to each?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2191 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1vEhsbxVMPJ6nHrwU9BT_AgCLLWlks5t_G6cgaJpZM4UQeax>
.
|
Does this need it's own issue then, so it doesn't get lost? |
Yes, that would probably be a good idea.
…On Thu, Jun 21, 2018 at 9:51 PM Aidan Heerdegen ***@***.***> wrote:
Does this need it's own issue then, so it doesn't get lost?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2191 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1lEE7z5wdd_cmlrNnLzUJWC5wmegks5t_HfFgaJpZM4UQeax>
.
|
I'm trying to wrap my head around what is needed to get the resample method to work but I must say I'm confused. Would it be possible/practical to create a branch with stubs in the code for the methods that need to be written (with a #2191 comment) so newbies can help fill-in the gaps? |
Take a look at #2458 for a very basic version of this. |
Thanks @shoyer for getting things started! @huard your help would be very much appreciated in implementing this. As mentioned in #2437 (comment), this is one of the biggest remaining gaps in functionality between xarray objects indexed by a CFTimeIndex and xarray objects indexed by a DatetimeIndex. |
This has been implemented in #2593 🎉. |
Hi folks, |
@zzheng93 this will be possible in the next release of xarray, so not quite yet, but soon. If you're in a hurry you could install the development version. |
@spencerkclark Thank you very much :) |
@zzheng93 welcome! One way to install the development version is to clone this repo, and do an editable install:
Then using resample with a daily frequency would look something like:
|
@spencerkclark Thank you very much for your help! I will install the development version on my local machine. |
@zzheng93 sure thing!
I know you didn't ask for help with this, but I can't resist :) -- I recommend you set up your own Python environment on Cheyenne. This is nice because it gives you full control over the packages you install (so you don't need to wait until someone else installs them for you). A good place to start on how to do this is the "Getting started with Pangeo on HPC" page on the Pangeo website.
I think with some more specific details regarding what you are looking to do, this could potentially be a good question to ask in the (relatively new) pangeo-data/ml-workflow-examples repo, where they are discussing machine learning workflows connected to xarray. |
@spencerkclark |
Now that CFTimeIndex has been implemented (#1252), one thing that remains to implement is resampling. @shoyer provided a sketch of how to implement it: #1252 (comment). In the interim, @spencerkclark provided a sketch of a workaround for some use-cases using groupby: #1270 (comment).
I thought it would be useful to have a new Issue specifically on this topic from which future conversation can continue. @shoyer, does that sketch you provided still seem like a good starting point?
The text was updated successfully, but these errors were encountered: