-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunks management with datetime64 and timedelta64 datatype #8230
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
I think the problem is here: xarray/xarray/backends/zarr.py Line 309 in c3b5ead
The Not sure if there's an easy solution though. Here is a similar MRE I've been using: import xarray as xr
import tempfile
da_expected = xr.DataArray(range(2), name="foo").astype("datetime64[ns]").chunk(1)
with tempfile.TemporaryDirectory() as tmpdir:
da_expected.to_zarr(tmpdir)
da_actual = xr.open_dataarray(tmpdir, engine="zarr", chunks={})
assert da_expected.chunks == da_actual.chunks == ((1, 1),), da_actual.chunks |
Is it different if you specify |
This is the syntax, right? da_expected.to_zarr(tmpdir, encoding={"foo": {"chunks": (1, 1)}}) Nope, same error. |
My bad! I specified the wrong encoding. Explicitly passing the chunks trough encoding works: import xarray as xr
import tempfile
da_expected = xr.DataArray(range(2), name="foo").astype("datetime64[ns]").chunk(1)
with tempfile.TemporaryDirectory() as tmpdir:
da_expected.to_zarr(tmpdir, encoding={"foo": {"chunks": (1, )}})
da_actual = xr.open_dataarray(tmpdir, engine="zarr", chunks={})
assert da_expected.chunks == da_actual.chunks == ((1, 1),), da_actual.chunks I think I know how to fix it in the zarr backend, I'll take a look tomorrow. |
This is this longstanding issue: #7132 (comment) |
What happened?
I need to perform operations with coordinates or data variables of datetime64[ns] or timedelta64[ns] data types.
Once I save the dataset or data array into Zarr format, the chunk size is arbitrarily modified by the to_zarr() function, even if I explicitly specify the encoding. It is mandatory to use the same chunk size for both disk and memory because I save each portion of the file using the region option of xarray.Dataset.to_zarr().
In addition, when I try to exploit parallelism, I encounter the error message "inconsistent chunk size".
What did you expect to happen?
I expect that the input chunk size is maintained in writing and reading from zarr
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-1045-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2
xarray: 2023.6.0
pandas: 2.0.3
numpy: 1.24.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.15.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.6.1
distributed: 2023.6.1
matplotlib: 3.7.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.9.22
setuptools: 68.0.0
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: 1.4.1
IPython: 8.14.0
sphinx: 7.0.1
The text was updated successfully, but these errors were encountered: