-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding time according to CF conventions raises error if a NaN is found #1662
Comments
Hi Guillaume! Nice to see so many old friends showing up on the xarray repo... The issue you raise is totally reasonable from a user perspective: missing values in datetime data should be permitted. But there are some upstream issues that make it challenging to solve (like most of our headaches related to datetime data). In numpy (and computer arithmetic in general), NaN only exists in floating point datatypes. It is impossible to have a numpy datetime array with NaN in it: >>> a = np.array(['2010-01-01', '2010-01-02'], dtype='datetime64[ns]')
>>> a[0] = np.nan
ValueError: Could not convert object to NumPy datetime The same error would be raised if Further downstream, xarray relies on netcdf4-python's num2date function to decode the date. The error is raised by that package. This is my understanding of the problem. Some other folks here like @jhamman and @spencerkclark might have ideas about how to solve it. They are working on a new package called netcdftime which will isolate and hopefully enhance such time encoding / decoding functions. |
I'm pretty sure this used to work in some form. I definitely worked with a dataset in the infancy of xarray that had coordinates with missing times. The current issue appears to be that pandas represents the
This appears to be specific to our use of a
|
Hi Ryan, never been very far, following/promoting xarray around here, and congrats for Pangeo ! Ok, I get the datatype being wrong, but about the issue from pandas TimedeltaIndex: |
Note that if the xarray decode_cf is given a NaT, in a datetime64, it works: attrs = {'units': 'days since 1950-01-01 00:00:00 UTC'} # Classic Argo data Julian Day reference
jd = [24658.46875, 24658.46366898, 24658.47256944, np.NaN] # Sample
def dirtyfixNaNjd(ref,day):
td = pd.NaT
if not np.isnan(day):
td = pd.Timedelta(days=day)
return pd.Timestamp(ref) + td
jd = [dirtyfixNaNjd('1950-01-01',day) for day in jd]
print jd [Timestamp('2017-07-06 11:15:00'), Timestamp('2017-07-06 11:07:40.999872'), Timestamp('2017-07-06 11:20:29.999616'), NaT] then: ds = xr.Dataset({'time': ('time', jd, {'units': 'ns'})}) # Update the units attribute appropriately
ds = xr.decode_cf(ds)
print ds['time'].values ['2017-07-06T11:15:00.000000000' '2017-07-06T11:07:40.999872000'
'2017-07-06T11:20:29.999616000' 'NaT'] |
Working with Argo data, I have difficulties decoding time-related variables:
More specifically, it may happens that a variable being a date contains FillValue that are set to NaN at the opening of the netcdf file. That makes the decoding to raise an error.
Sure I can open the netcdf file with the decode_times = False option but it's not an issue of being able or not to decode the data, it seems to me to be about how to handle FillValue in a time axis.
I understand that with most of gridded datasets, the time axis/dimension/coordinate is full and does not contains missing values, that may be explaining why nobody have reported this before.
Here is a simple way to reproduce the error:
But then:
I would expect the decoding to work like in the first case and to simply preserve NaNs where they are.
Any ideas or suggestions ?
Thanks
The text was updated successfully, but these errors were encountered: