-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time decoding error with "days since" #521
Comments
In fact I just found a netCDF issue on this topic! Apparently they don't think it should be supported. Unidata/netcdf4-python#442 |
Yes - this is all coming from the The work around with xray is to use As an aside, I also work with CESM output and this is a common problem with its netCDF output. |
The PR above fixes this issue. However, since my model years are in the range 100-200, I am still getting the warning
and eventually when I try to access the time data, an error with a very long stack trace ending with
I see there is a check in conventions.py that the year has to lie between 1678 and 2226. What is the reason for this? |
We try to cast all the time variables to a pandas time index. This gives xray the ability to use many of the fast and fancy timeseries tools that pandas has. One consequence of that is that non-standard calendars, such as the "noleap" calendar must have dates inside the valid range of the standard calendars (1678 and 2226). Does that make since? Ideally, numpy and pandas would support custom calendars but they don't so, at this point, we're bound to there limits. |
@jhamman Thanks for the clear explanation! One of the main uses for non-standard calendars would be climate model "control runs", which don't occur any any specific point in historical time but still have seasonal cycles, well defined months, etc. It would be nice to have "group by" functionality for these datasets. But I do see how this is impossible with the current numpy datetime64 datatype. Perhaps the long term fix is to implement non-standard calendars within numpy itself. |
I agree, although that sounds like quite an undertaking. Maybe raise an issue over at numpy and ask if they would be interested in a multi-calendar api? If numpy could make it work, then I'm sure pandas could as well. |
In case anyone is still struggling with the CESM POP time units convention, with the new CF support of version 0.12 the problem is (almost) solved. I have slightly different CESM POP netcdf output with time attributes import xarray as xr # version >= 0.12
ds = xr.open_dataset('some_CESM_output_file.nc', decode_times=False)
ds = ds.drop_dims(['d2'])
ds = xr.decode_cf(ds, use_cftime=True) Now the xarray Dataset has a |
Could you provide the output of |
Of course, here is the `ds.info()` output:xarray.Dataset { dimensions: bnds = 2 ; d2 = 2 ; nlat = 2400 ; nlon = 3600 ; time = 1 ; z_t = 42 ; z_t_150m = 12 ; z_w = 42 ; z_w_bot = 42 ; z_w_top = 42 ;variables: // global attributes: |
Thanks -- in looking at the metadata it seems there is nothing unusual about the My feeling is that the issue here remains the fact that cftime dates do not support year zero (see the upstream issue @rabernat mentioned earlier: Unidata/netcdf4-python#442). That said, it's surprising that dropping the If you don't mind, could you provide me with two more things?
|
Opening the file as
and after decoding
The traceback of opening the file without traceback--------------------------------------------------------------------------- OutOfBoundsDatetime Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar) 128 try: --> 129 ref_date = pd.Timestamp(ref_date) 130 except ValueError:pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.new() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject() pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds() OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 0-01-01 00:00:00 During handling of the above exception, another exception occurred: OutOfBoundsDatetime Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar) OutOfBoundsDatetime: During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_datetime_with_cftime(num_dates, units, calendar) cftime/_cftime.pyx in cftime._cftime.num2date() ValueError: zero not allowed as a reference year, does not exist in Julian or Gregorian calendars During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/conventions.py in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in decode(self, variable, name) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime) ValueError: unable to decode time units 'days since 0000-01-01 00:00:00' with the default calendar. Try opening your dataset with decode_times=False. |
Great that's helpful, thanks. I see what's happening now. There's a lot of tricky things going on, so bear with me. Let's examine the output from
There are a few important things to note:
For non-real-world calendars (e.g. 365_day), reference dates in cftime should allow year zero. This was fixed upstream in Unidata/netcdf4-python#470. That being said, because of (2), the calendar for
Ultimately though, with #2571, we try to propagate the time-related attributes from the time coordinate to the associated bounds coordinate (so in normal circumstances we would use a 365_day calendar in this case as well). But, because of (3), this is not possible due to the fact that the In theory, another possible way to work around this would be to open the dataset with
Now, this may still not work depending on the values in the the In conclusion, I'm afraid there is nothing we can do in xarray to automatically fix this situation. Issue (3) in the netCDF file is particularly unfortunate. If it weren't for that, I think all of these issues would be possible to work around, e.g. with #2571 here, or with fixes upstream. |
I opened an issue in cftime regarding this: Unidata/cftime#114. |
It's important to be clear that the issues 2 and 3 that @spencerkclark pointed out are objectively errors in the metadata. We have worked very hard over many years to enable xarray to correctly parse CF-compliant dates with non-standard calendars. But xarray cannot and should not be expected to magically fix metadata that is inconsistent or incomplete. You really need to bring these issues to the attention of whoever generated |
@rabernat , it is not clear to me that issue 2 is an objective error in the metadata. The CF conventions section on the
I conclude from this that software parsing CF metadata should have the variable identified by the However, this is confounded by issue 3, that If CESM-POP were to adhere more closely to the CF recommendation in this section, I think it would drop |
@klindsay28 -- thanks for the clarification. You're clearly right about 2, and I was misinformed. The problem is that 3 makes it impossible follow the CF convention rules to overcome 2 (which xarray would try to do). |
@AJueling , do you know the provenance of the file with |
Thank you all for the clarification! I will get in touch with the person who ran the model and get back to you as soon as possible. |
I'm also getting the same error: |
I am trying to use xray with some CESM POP model netCDF output, which supposedly follows CF-1.0 conventions. It is failing because the models time units are "'days since 0000-01-01 00:00:00". When calling open_dataset, I get the following error:
Full metadata for the time variable:
I guess this is a problem with the underlying netCDF4 num2date package?
The text was updated successfully, but these errors were encountered: