You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
When using map_blocks with a function which has non-xarray arguments before arguments that are xarray dataarray (e.g. arg1 is a xarray object, arg2 is not xarray, and arg3 is a xarray dataarray), the code fails to convert the dataarray argument to dataset and triggers downstream failure. The downstream failure occurs because ds.chunks returns a dict, whereas da.chunks returns a tuple.
What you expected to happen:
The code intends to convert dataarrays to datasets before calling .chunks, and I expect it to do so.
Minimal Complete Verifiable Example:
import xarray as xr
import pandas as pd
import numpy as np
import string
def random_point_data(n_points=1, n_times=100):
size = (n_times, n_points)
dims = ('time', 'point')
times = pd.date_range('1979-01-01', freq='1D', periods=n_times)
da = xr.DataArray(np.random.random(size=size), dims=(dims), coords={'time': times})
return da
def mock_function(da1, non_xarray_input, da2):
return da1
X = random_point_data(n_points=3).chunk({'point': 1})
out = xr.map_blocks(mock_function, X, args=['random_string', X])
gives an error of
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-65-dea560baad18> in <module>
14
15 X = random_point_data(n_points=3).chunk({'point': 1})
---> 16 out = xr.map_blocks(mock_function, X, args=['random_string', X])
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/parallel.py in map_blocks(func, obj, args, kwargs, template)
363 for arg in xarray_objs[1:]:
364 assert_chunks_compatible(npargs[0], arg)
--> 365 input_chunks.update(arg.chunks)
366 input_indexes.update(arg.indexes)
367
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Anything else we need to know?:
This should be fixed with a one line change here
from
xarray_objs = tuple(
dataarray_to_dataset(arg) if is_da else arg
for is_da, arg in zip(is_array, aligned)
)
to
xarray_objs = tuple(
dataarray_to_dataset(arg) if isinstance(arg, xr.DataArray) else arg
for arg in aligned
)
This is because is_array is determined on all args regardless of whether the arg is a xarray object, and aligned has already been filtered down to xarray objects only.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.14.177-139.253.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
What happened:
When using
map_blocks
with a function which has non-xarray arguments before arguments that are xarray dataarray (e.g. arg1 is a xarray object, arg2 is not xarray, and arg3 is a xarray dataarray), the code fails to convert the dataarray argument to dataset and triggers downstream failure. The downstream failure occurs because ds.chunks returns a dict, whereas da.chunks returns a tuple.What you expected to happen:
The code intends to convert dataarrays to datasets before calling .chunks, and I expect it to do so.
Minimal Complete Verifiable Example:
gives an error of
Anything else we need to know?:
This should be fixed with a one line change here
from
to
This is because
is_array
is determined on all args regardless of whether the arg is a xarray object, andaligned
has already been filtered down to xarray objects only.Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.14.177-139.253.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.2
pandas: 1.2.1
numpy: 1.20.0
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: installed
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: 2.10.3
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2021.01.1
distributed: 2021.01.1
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20210108
pip: 20.3.4
conda: None
pytest: 6.2.5
IPython: 7.20.0
sphinx: 3.4.3
The text was updated successfully, but these errors were encountered: