Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling daily input data to half-yearly data generates an excessive time coordinate #2787

Closed
grassland-curing-cfa opened this issue Feb 25, 2019 · 2 comments

Comments

@grassland-curing-cfa
Copy link

grassland-curing-cfa commented Feb 25, 2019

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you:
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

# Your code here
import numpy as np
import pandas as pd
import xarray as xr

time = pd.date_range('1972-01-01', freq='D', periods=366)    # a leap year
ds = xr.Dataset({'foo': ('time', np.arange(366)), 'time': time})

# Resample ds to every 6 months
res = ds['foo'].resample(time='6MS', closed='left').sum('time')
print(res)

Problem description

<xarray.DataArray 'foo' (time: 3)>
array([16471., 50324., nan])
Coordinates:

  • time (time) datetime64[ns] 1972-01-01 1972-07-01 1973-01-01

An excessive time coordinate of 1973-01-01 was generated despite the variable values being nan.

Expected Output

<xarray.DataArray 'foo' (time: 2)>
array([16471., 50324.])
Coordinates:

  • time (time) datetime64[ns] 1972-01-01 1972-07-01

Output of xr.show_versions()

# Paste the output here xr.show_versions() here

python: 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 47 Stepping 2, GenuineIntel

xarray: 0.11.2
pandas: 0.23.4
numpy: 1.16.0
scipy: 1.2.0
netCDF4: 1.4.2

@spencerkclark
Copy link
Member

We've noticed this behavior before too: #2593 (comment). It turns out it was fixed in the most recent version of pandas, so I recommend upgrading:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.24.0'

In [3]: import xarray as xr

In [4]: time = pd.date_range('1972-01-01', freq='D', periods=366)

In [5]: ds = xr.Dataset({'foo': ('time', range(366)), 'time': time})

In [6]: res = ds['foo'].resample(time='6MS', closed='left').sum('time')

In [7]: res
Out[7]:
<xarray.DataArray 'foo' (time: 2)>
array([16471, 50324])
Coordinates:
  * time     (time) datetime64[ns] 1972-01-01 1972-07-01

@grassland-curing-cfa
Copy link
Author

Thanks @spencerkclark ! This fix worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants