map_blocks not converting dataarrays correctly #6052

tcchiao · 2021-12-07T18:17:21Z

What happened:
When using map_blocks with a function which has non-xarray arguments before arguments that are xarray dataarray (e.g. arg1 is a xarray object, arg2 is not xarray, and arg3 is a xarray dataarray), the code fails to convert the dataarray argument to dataset and triggers downstream failure. The downstream failure occurs because ds.chunks returns a dict, whereas da.chunks returns a tuple.

What you expected to happen:
The code intends to convert dataarrays to datasets before calling .chunks, and I expect it to do so.

Minimal Complete Verifiable Example:

import xarray as xr
import pandas as pd
import numpy as np
import string

def random_point_data(n_points=1, n_times=100):
    size = (n_times, n_points)
    dims = ('time', 'point')
    times = pd.date_range('1979-01-01', freq='1D', periods=n_times)
    da = xr.DataArray(np.random.random(size=size), dims=(dims), coords={'time': times})
    return da

def mock_function(da1, non_xarray_input, da2):
    return da1

X = random_point_data(n_points=3).chunk({'point': 1})
out = xr.map_blocks(mock_function, X, args=['random_string', X])

gives an error of

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-dea560baad18> in <module>
     14 
     15 X = random_point_data(n_points=3).chunk({'point': 1})
---> 16 out = xr.map_blocks(mock_function, X, args=['random_string', X])

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/parallel.py in map_blocks(func, obj, args, kwargs, template)
    363     for arg in xarray_objs[1:]:
    364         assert_chunks_compatible(npargs[0], arg)
--> 365         input_chunks.update(arg.chunks)
    366         input_indexes.update(arg.indexes)
    367 

ValueError: dictionary update sequence element #0 has length 1; 2 is required

Anything else we need to know?:
This should be fixed with a one line change here

from

    xarray_objs = tuple(
        dataarray_to_dataset(arg) if is_da else arg
        for is_da, arg in zip(is_array, aligned)
    )

to

    xarray_objs = tuple(
        dataarray_to_dataset(arg) if isinstance(arg, xr.DataArray) else arg
        for arg in aligned
    )

This is because is_array is determined on all args regardless of whether the arg is a xarray object, and aligned has already been filtered down to xarray objects only.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.14.177-139.253.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.2
pandas: 1.2.1
numpy: 1.20.0
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: installed
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: 2.10.3
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2021.01.1
distributed: 2021.01.1
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20210108
pip: 20.3.4
conda: None
pytest: 6.2.5
IPython: 7.20.0
sphinx: 3.4.3

The text was updated successfully, but these errors were encountered:

dcherian · 2021-12-07T18:42:43Z

Thanks @tcchiao for the very well written issue!

From a quick check, your fix looks OK. Can you send in a PR please? We have some documentation on contributing here: https://xarray.pydata.org/en/stable/contributing.html

TomNicholas · 2021-12-08T04:55:10Z

The downstream failure occurs because ds.chunks returns a dict, whereas da.chunks returns a tuple.

FYI you can now guarantee you get a dict by calling .chunksizes, see #5843

jhamman added the bug label Dec 7, 2021

tcchiao mentioned this issue Dec 21, 2021

Fix dataarray determination in map_blocks #6089

Merged

3 tasks

dcherian closed this as completed in #6089 Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map_blocks not converting dataarrays correctly #6052

map_blocks not converting dataarrays correctly #6052

tcchiao commented Dec 7, 2021 •

edited

Loading

INSTALLED VERSIONS

dcherian commented Dec 7, 2021

TomNicholas commented Dec 8, 2021

map_blocks not converting dataarrays correctly #6052

map_blocks not converting dataarrays correctly #6052

Comments

tcchiao commented Dec 7, 2021 • edited Loading

INSTALLED VERSIONS

dcherian commented Dec 7, 2021

TomNicholas commented Dec 8, 2021

tcchiao commented Dec 7, 2021 •

edited

Loading