-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cftime
to avoid out of bounds datetime
when decoding non-CF time coordinates
#283
Conversation
@tomvothecoder – I added this PR to get things rolling; I think you'll have some insights to improve the code in places. I got stuck resolving a mypi issue. If you are okay with the direction I am going, I can fix the unit tests. |
Hi @pochedls, I'm taking a look right now. I'll also rebase this branch on the latest |
26ced7f
to
3050a8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pochedls, I reviewed your changes and rebased the branch on the latest main
.
There are some outdated review comments because I ended up doing some refactoring since I was debugging directly on your branch anyways.
The next thing to for you to do to review my changes and fix the tests.
- Add `types-python-dateutil` to `mypy` dependencies - Update `ref_date` var to `ref_dt_obj` to avoid mypy error `error: Unsupported operand types for + ("str" and "relativedelta")` - Use dictionary unpacking for units variable - Use `datetime.strptime` instead of `pd.datetime()` which runs into the `pd.Timestamp` limitation - Add logger.warning when non-CF compliant time coords cannot be decoded
5a2dcc9
to
a681787
Compare
Codecov Report
@@ Coverage Diff @@
## main #283 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 1146 1169 +23
=========================================
+ Hits 1146 1169 +23
Continue to review full report at Codecov.
|
1320b50
to
a2868e8
Compare
a2868e8
to
4c33bf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My second pass, which includes test fixes and additional refactoring.
Co-authored-by: pochedls <[email protected]>
Co-authored-by: pochedls <[email protected]>
I modified
This also fixes some issues I had in subsetting. I had been using the file
I could fix this by using cftime object in my |
- Move try and except statements into `decode_non_cf_time()` - Extract function `_get_cftime_coords()` to encapsulate related logic from `decode_non_cf_time()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pochedls, here's another pass of refactoring. I think we're in good shape now.
Along with xarray's get_date_type()
that you cited, xarray also has convert_times()
that we can use. After refactoring with these methods, the existing tests still passed.
Here's a diff of your last commit and this one:
3e9d09c
(#283)
Side-note: I am going to try silence the logger warnings in the test suite.
date_type = get_date_type(calendar) | ||
coords = convert_times(offsets, date_type=date_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of implementing our own logic, we can reuse xarray's get_date_type()
(you cited this one) and convert_times()
methods.
xarray.coding.cftime_offsets.get_date_type()
xarray.coding.times.convert_times()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind this. I initially used the xr
function, but then I wasn't sure if we wanted/needed their _is_standard_calendar
check. I guess this will never run, because we set use_cftime = True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think convert_times
is hitting problems for years far into the future (convert_times
is based on pandas):
import xcdat
fn = '/p/user_pub/climate_work/pochedley1/cmip6_msu/spliced/ttt_Amon_MRI-ESM2-0_historical-ssp585_r1i1p1f1_gn_185001-230012.nc'
ds = xcdat.open_dataset(fn)
File ~/code/xc
dat/xcdat/dataset.py:719, in _get_cftime_coords(ref_date, offsets, calendar, units)
716 # Convert the array ofdatetime
objects intocftime
objects based on
717 # the calendar type.
718 date_type = get_date_type(calendar)
--> 719 coords = convert_times(offsets, date_type=date_type)
721 return coordsFile ~/bin/anaconda3/envs/xcdat_dev/lib/python3.9/site-packages/xarray/coding/times.py:451, in convert_times(times, date_type, raise_on_invalid)
448 return cftime_to_nptime(times, raise_on_invalid=raise_on_invalid)
449 if is_np_datetime_like(times.dtype):
450 # Convert datetime64 objects to Timestamps since those have year, month, day, etc. attributes
--> 451 times = pd.DatetimeIndex(times)
452 new = np.empty(times.shape, dtype="O")
453 for i, t in enumerate(times):.
.
.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2262-05-01 00:00:00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to my suggestion to update the dtype
of the offsets
np.ndarray from datetime64
to object
.
I validated this fix using your example.
xcdat/dataset.py
Outdated
ref_datetime: datetime = parser.parse(ref_date, default=datetime(2000, 1, 1)) | ||
offsets = np.array( | ||
[ref_datetime + rd.relativedelta(**{units: offset}) for offset in offsets], | ||
dtype="datetime64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype="datetime64", | |
dtype="object", |
In the convert_times()
method, there is a conditional that checks if the offsets
np.ndarray dtype is_np_datetime_like
(which then converts to pd.DatetimeIndex, and we don't want that due to the Timestamp limitations).
Source code: https://github.com/pydata/xarray/blob/3c98ec7d96cc4b46664850cc7a40af2bc184fea0/xarray/coding/times.py#L454-L460
Instead, by setting the dtype=object
, it returns an np.ndarray of cftime
objects.
Source code: https://github.com/pydata/xarray/blob/3c98ec7d96cc4b46664850cc7a40af2bc184fea0/xarray/coding/times.py#L461-L476
- Update logger warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you find any more issues, we should be good to merge.
datetime
due to pandas Timestamp limitation with cftime
datetime
due to pandas Timestamp limitation with cftime
datetime
due to pandas Timestamp limitation by using cftime
datetime
due to pandas Timestamp limitation by using cftime
datetime
when opening dataset due to pandas Timestamp limitation by using cftime
datetime
when opening dataset due to pandas Timestamp limitation by using cftime
cftime
to avoid out of bounds datetime
when decoding non-CF time coordinates
Description
This PR is intended to address an issue in which datasets with non-cf-compliant times will not open if they go too far into the future.
Checklist
If applicable: