Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 #2587

Closed
Huite opened this issue Dec 2, 2018 · 4 comments · Fixed by #9618
Closed

DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 #2587

Huite opened this issue Dec 2, 2018 · 4 comments · Fixed by #9618

Comments

@Huite
Copy link
Contributor

Huite commented Dec 2, 2018

Code Sample

import xarray as xr
import numpy as np
from datetime import datetime

time = [np.datetime64(datetime.strptime("10000101", "%Y%m%d"))]
print(time[0])
print(np.dtype(time[0]))

da = xr.DataArray(time, ("time",), {"time":time})
print(da)

Results in:

1000-01-01T00:00:00.000000
datetime64[us]

<xarray.DataArray (time: 1)>
array(['2169-02-08T23:09:07.419103232'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2169-02-08T23:09:07.419103232

Problem description

I was happy to see cftime as default in the release notes for 0.11.0:

Xarray will now always use cftime.datetime objects, rather than by default trying to coerce them into np.datetime64[ns] objects. A CFTimeIndex will be used for indexing along time coordinates in these cases.

However, it seems that the DataArray constructor does not use cftime (yet?), and coerces to np.datetime64[ns] here:

if isinstance(data, np.ndarray):
if data.dtype.kind == 'O':
data = _possibly_convert_objects(data)
elif data.dtype.kind == 'M':
data = np.asarray(data, 'datetime64[ns]')
elif data.dtype.kind == 'm':
data = np.asarray(data, 'timedelta64[ns]')

Expected Output

I think I'd expect cftime.datetime in this case as well. Some coercion happens anyway as pandas timestamps are turned into np.datetime64[ns].

(But perhaps this was already on your radar, and am I just a little too eager!)

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

xarray: 0.11.0
pandas: 0.23.3
numpy: 1.15.3
scipy: 1.1.0
netCDF4: 1.3.1
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.0
PseudonetCDF: None
rasterio: 1.0.0
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.19.2
distributed: 1.23.2
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: 3.6.3
IPython: 6.4.0
sphinx: 1.7.5
@spencerkclark
Copy link
Member

Thanks for the clear report @Huite. Indeed we did not consider this particular use case when updating the behavior in version 0.11 (i.e. non-cftime dates passed to the DataArray constructor that are outside the np.datetime64[ns] range).

As you noted, in this situation it would probably make sense to coerce these dates to a cftime date type; specifically, I think the correct type to use would be cftime.DatetimeProlepticGregorian, because the Python documentation states that datetime.date (and by extension datetime.datetime) objects assume "the current Gregorian calendar always was, and always will be, in effect."

Note if you use cftime.datetime objects directly things work as you might expect:

In [1]: import cftime; import xarray

In [2]: times = [cftime.DatetimeProlepticGregorian(1000, 1, 1)]

In [3]: da = xarray.DataArray(times, dims=['time'], coords=[times])

In [4]: da
Out[4]:
<xarray.DataArray (time: 1)>
array([cftime.DatetimeProlepticGregorian(1000, 1, 1, 0, 0, 0, 0, 0, 1)], dtype=object)
Coordinates:
  * time     (time) object 1000-01-01 00:00:00

@Huite
Copy link
Contributor Author

Huite commented Dec 4, 2018

Thanks, I'll indeed use cftime.datetime objects directly.

cftime.DatetimeProlepticGregorian seems the obvious default to me as well.

@stale
Copy link

stale bot commented Nov 16, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Nov 16, 2020
@dcherian dcherian added bug and removed stale labels Apr 18, 2022
@kmuehlbauer
Copy link
Contributor

The example raises with xarray 2024.10.0:

UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  da = xr.DataArray(time, ("time",), {"time":time})

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1000-01-01 00:00:00

This should be working after #9618 is finalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants