-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ds.to_netcdf() changes values of variable #6272
Comments
You may have
There's a chance xarray is discarding the attributes/encoding during the What's the output of print(ds.z.encoding)
print(ds.z.attrs) ?? |
Thank you for your reply @andersy005. So this is the encodiing before writing to netcdf, when loaded with
After saving to netcdf with
So the chuncks were concatenated into this single file. Now, if I look at for instance the same timeseries before and after saving to As this offset is not applied constantly in the dimension *sorry for the large data gap in between 1980 and the 2000's. |
and in case I multiply the variable
|
I came across the same issue! @ArcticSnow have you found any solutions here? Struggled a bit but didn't find any solutions yet ... |
Changing the line from |
If someone has an MCVE, please feel free to add here. Otherwise, we can close this until we have one |
@max-sixty , what is meant by MCVE? |
|
Without checking or knowing what is going on I assume that |
Late to the party, here, thanks for reviving @yuting-chen-mck. Another issue with packed data (with import numpy as np
packed_value1 = np.int16(-30000)
# original values of nth file
scale_factor1 = 0.6796473581594864
add_offset1 = 21239.89345268811
# data of first file
scale_factor2 = 0.4796473581594864
add_offset2 = 21239.89345268811
# unpacking with original data
unpacked_value = packed_value1 * scale_factor1 + add_offset1
print(f"1st unpacking: {packed_value1} -> {unpacked_value}")
# repacking with data of first file
packed_value2 = np.int16((unpacked_value - add_offset2) / scale_factor2)
print(f"repacking: {unpacked_value} -> {packed_value2}")
# unpacking again
unpacked_value2 = packed_value2 * scale_factor2 + add_offset2
print(f"2nd unpacking: {packed_value2} -> {unpacked_value2}")
If that's the case here, in issue #5739 (and therein linked ones) is more reading. Bottom line: either drop |
So the root cause here is probably how |
What happened?
I am very puzzled by an odd behavior of
ds.to_netcdf()
that modifies the value of a variable by offseting partly the vlue by 44500.I have list of file containing meteorological variables organized in three dimensions
'time', 'longitude', latitude'
. The files are loaded usingxr.open_mfdataset('*.nc')
. No problem here. The dataset is loaded in chuncks using Dask.Now, I would like to save a subset of this dataset to a netcdf file as follow:
ds.isel(latitude[1,2,3], longitude=[3,4,5]).to_netcdf('sub.nc')
. So far nothing particular.The value
z
is afloat32
which varies from 2000 to -2000 along the time dimension. After being saved in the subsample,z
is still afloat32
but the values that are less than -1000 are being offset by 44500. However, if I do(ds.z.isel(latitude[1,2,3], longitude=[3,4,5])*1).to_netcdf('sub.nc')
, instead, then all values in the subsampled netcdf are fine. I am very puzzled by this behavior. Could this be an odd behavior of dask chuncks andto_netcdf()
?What did you expect to happen?
I expected no modification of the data after saving to netcdf, no matter what.
Minimal Complete Verifiable Example
I will share files upon request.
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.13.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.1
xarray: 0.20.1
pandas: 1.3.5
numpy: 1.20.3
scipy: 1.7.3
netCDF4: 1.5.7
pydap: None
h5netcdf: 0.11.0
h5py: 3.6.0
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.8
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.10.0
distributed: 2021.10.0
matplotlib: 3.5.0
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: 7.31.1
sphinx: None
The text was updated successfully, but these errors were encountered: