Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time limitation (between years 1678 and 2262) restrictive to climate community #789

Closed
pwolfram opened this issue Mar 10, 2016 · 13 comments
Closed

Comments

@pwolfram
Copy link
Contributor

The restriction of

One unfortunate limitation of using datetime64[ns] is that it limits the native representation of dates to
those that fall between the years 1678 and 2262. When a netCDF file contains dates outside of these
bounds, dates will be returned as arrays of netcdftime.datetime objects.

is a potential roadblock inhibiting easy adoption of this library in the climate community.

@pwolfram
Copy link
Contributor Author

@jhamman and @shoyer, do you know if more general support for times will be provided?

I would say this is a potential roadblock inhibiting easy adoption of this library within the climate community. For example, this is creating problems when we setup and analyze climate models (cc @akturner, @douglasjacobsen, @milenaveneziani). There are obvious work arounds, but they are hacks, e.g., I just add 1700 or some arbitrary year within my wrapper to MPAS at https://github.com/pwolfram/mpas_xarray_wrapper. However, this does not appear to be a viable long-term solution.

Does anyone have some advice on how to better deal with this problem?

@pwolfram
Copy link
Contributor Author

See also #531 (cc @jsbj, @darothen, @jhamman); #521 (cc @rabernat, @jhamman)

It appears this is an issue with the numpy datetime64 API (numpy/numpy#6207).

@shoyer
Copy link
Member

shoyer commented Mar 10, 2016

There are two issues here:

  1. Support for years outside 1678-2262 -- blocked by pandas standardizing on nanosecond precision.
  2. Support for custom calendars -- blocked by limitations of numpy's datetime64.

Unfortunately, I don't see easy fixes to either of these, though if I had to guess, adding support for other datetime precisions (perhaps only sub-second resolution) to pandas would be easier than fixing up NumPy's datetime64 itself (which is already pretty hacky).

@milenaveneziani
Copy link

Thank you, @pwolfram, for bringing this up. I just wanted to add that this limitation is important for analyzing so-called fixed-conditions climate model experiments, in which the model is run with fixed greenhouse gases conditions for a particular year (say, fixed levels of CO2 concentration representative of levels for the year 1850, for example). In these cases, time is simply measured with respect to start of simulation (year 0, or year 1) and each year is a no-leap year of 365 days.

@pwolfram
Copy link
Contributor Author

Thank you very much @shoyer for your very fast reply. If we fixed the issue in pandas (e.g., your 1) would that be sufficient to resolve the issue or do we also need to enhance datetime64 also (e.g., your 2)? What is your recommendation on the best way to proceed and is it even reasonable to do so?

@rabernat
Copy link
Contributor

👍 I hit this problem months back when analyzing CESM runs.

It seems silly that the adoption of xarray by the climate modeling community should rest on these highly technical issues. But that seems to be the reality. The challenge is to raise the profile of these issues within the numpy and pandas communities such that they become a high priority. Even better would be dedicated developer time (e.g. from someone at UNIDATA) to implement fixes.

@shoyer
Copy link
Member

shoyer commented Mar 10, 2016

Well, the good news is that non-standard calendars like 365 are actually a bit easier than the Gregorian calendar, at least if you were starting from scratch. As much as I love pushing fixes upstream, the most sane approach is to probably write a CustomDatetimeIndex class from scratch and start checking off boxes on datetime functionality:

  • support for datetime indexing functionality (the pandas get_indexer, get_loc and slice_indexer methods)
  • support for pulling out datetime components (e.g., year or hour)
  • support for resample (I'm not exactly sure what the right API is here)

This might make slightly more sense in a related but distinct project to xarray.

NumPy and pandas developers will listen sympathetically, but ultimately nobody is going to work on this unless there is funding or they need it for their own work -- that's just how open source works. Fixing the underlying technology so these problems can be solved the "right" way is on the roadmap, but only in a vague, we'll get to it eventually kind of way.

@max-sixty
Copy link
Collaborator

Periods can go back much further, depending on the precision you need:

In [26]: pd.Period('1000', freq='D')
Out[26]: Period('1000-01-01', 'D')

@jhamman
Copy link
Member

jhamman commented Mar 10, 2016

This might make slightly more sense in a related but distinct project to xarray.

@shoyer - are you thinking xarray would fall back to a CustomDatetimeIndex for non-standard calendars?

I actually don't think it would be all that hard to do this. @jswhit's netcdftime module is at least a good starting point. There's a lot to build on in netcdftime and pandas.tseries.index. I'd actually say this should be targeted at Pandas (probably as a side project) rather than xarray. Ultimately, it would be nice to be able to move timeseries back and forth without any hassle.

I'd be happy to help pull this together, although, I won't be able to make significant contributions until the summer. Is anyone chomping at the bit to work on something like this?

@brews
Copy link
Contributor

brews commented May 18, 2016

Just curious, has anyone tried to open an issue in pandas for this?

@jhamman
Copy link
Member

jhamman commented May 18, 2016

@brews - I think this issue (pandas-dev/pandas#7307) covers the main gist of what we're talking about here.

@spencerahill
Copy link
Contributor

Pandas has created a poll on their mailing list about this issue...I encourage everybody to speak up there: https://groups.google.com/forum/#!topic/pydata/kk04maBGw1U

(Will blast this to xarray mailing list also)

@spencerahill
Copy link
Contributor

Closed by #1252 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants