-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences on datetime values appears after writing reindexed variable on netCDF file #1064
Comments
This is the warning I got when I wrote on my file with "to_necdf()" @jhamman It's seems than the error appears only with a variable "rain" who commes from a previous created netcdf file, but I will try to provide you an example. tanks |
@jhamman Here my example file |
I faced this issue when switching from a The first merged dataset had a time dimension which If I try to store values like '2017-08-20 00:00:30', I get the warning Maybe it is similar in your case: netcdf stored the data as 'hours since XXXX', so you lose the minutes. |
@NotSqrt can you make a minimum working example for this? e.g., a netCDF file with problematic data, and associated code that writes a netCDF file with lost time resolution. That would really help us diagnose and solve this problem. |
There you go ! import numpy
import pandas
import tempfile
import warnings
import xarray
array1 = xarray.DataArray(
numpy.random.rand(5),
dims=['time'],
coords={'time': pandas.to_datetime(['2018-01-01', '2018-01-01 00:01', '2018-01-01 00:02', '2018-01-01 00:03', '2018-01-01 00:04'])},
name='foo'
)
array2 = xarray.DataArray(
numpy.random.rand(5),
dims=['time'],
coords={'time': pandas.to_datetime(['2018-01-01 00:05', '2018-01-01 00:05:10', '2018-01-01 00:05:20', '2018-01-01 00:05:30', '2018-01-01 00:05:40'])},
name='foo'
)
with tempfile.NamedTemporaryFile() as tmp:
# save first array
array1.to_netcdf(tmp.name)
# reload it
array1_reloaded = xarray.open_dataarray(tmp.name)
# the time encoding stores minutes as int, so seconds won't be allowed at next call of to_netcdf
assert array1_reloaded.time.encoding['dtype'] == numpy.int64
assert array1_reloaded.time.encoding['units'] == 'minutes since 2018-01-01 00:00:00'
merged = xarray.merge([array1_reloaded, array2])
array1_reloaded.close()
with warnings.catch_warnings():
warnings.filterwarnings('error', category=RuntimeWarning)
merged.to_netcdf(tmp.name) |
FYI, |
@NotSqrt If you are still in the works with this, I'd appreciate if you could test this against #7827. This adds another warning with a some more detail what's going on. The issue remains that the wanted encoding in |
I've run the example I gave above. import numpy
import pandas
import tempfile
import warnings
import xarray
array1 = xarray.DataArray(
numpy.random.rand(5),
dims=['time'],
coords={'time': pandas.to_datetime(['2018-01-01', '2018-01-01 00:01', '2018-01-01 00:02', '2018-01-01 00:03', '2018-01-01 00:04'], format='ISO8601')},
name='foo'
)
array2 = xarray.DataArray(
numpy.random.rand(5),
dims=['time'],
coords={'time': pandas.to_datetime(['2018-01-01 00:05', '2018-01-01 00:05:10', '2018-01-01 00:05:20', '2018-01-01 00:05:30', '2018-01-01 00:05:40'], format='ISO8601')},
name='foo'
)
with tempfile.NamedTemporaryFile() as tmp:
# save first array
array1.to_netcdf(tmp.name)
# reload it
array1_reloaded = xarray.open_dataarray(tmp.name)
# the time encoding stores minutes as int, so seconds won't be allowed at next call of to_netcdf
assert array1_reloaded.time.encoding['dtype'] == numpy.int64
assert array1_reloaded.time.encoding['units'] == 'minutes since 2018-01-01 00:00:00'
merged = xarray.merge([array1_reloaded, array2])
array1_reloaded.close()
# this line avoids losing precision and removes both warnings
#merged.time.encoding = {}
# this line removes the conversion to ints, which solves the resolution loss and removes the second warning
#merged.time.encoding.pop('dtype')
merged.to_netcdf(tmp.name)
merged_reloaded = xarray.open_dataarray(tmp.name)
numpy.testing.assert_array_equal(
numpy.concatenate([array1.time, array2.time]),
merged_reloaded.time.values
) I see that now the warnings are:
And as the last code statement still shows that the seconds are lost, we still have to use If the resolution loss can't be fixed automatically, what would be nice in the warning is a link or a summary of what the user has to do to solve the resolution loss ! Thanks ! |
Thanks @NotSqrt for the detailed test and reasoning. The issue is as you already wrote with As we do not update I very much agree, that the user should get as much information out of any warnings/errors to follow up easily. There might be at least the 3 following actions:
From my perspective the less intrusive action would be 3b. For your example this would just print the first warning (which provides the needed information) and the seconds will be preserved. |
#8201 will take care of this issue as follows: It issues that warning:
And it automatically drops |
In my Dataset i've got a time serie coordinate who begins like this
And all is ok when I write and re-open the netdcdf file
Then i try to add to this dataset a reindexed variable like this
da["MeanRainfallHeigh"] = rain.reindex(time =da.time).fillna(0)
Everything is still good for the writing, but when I reopen the netcdf file, the time values are modified for the minutes part.
Thanks!
The text was updated successfully, but these errors were encountered: