Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure in Xarray test suite post-Dask tokenization update #8788

Closed
andersy005 opened this issue Feb 27, 2024 · 1 comment · Fixed by #8797
Closed

CI Failure in Xarray test suite post-Dask tokenization update #8788

andersy005 opened this issue Feb 27, 2024 · 1 comment · Fixed by #8797
Assignees
Labels
CI Continuous Integration tools topic-dask

Comments

@andersy005
Copy link
Member

What is your issue?

Recent changes in Dask's tokenization process (dask/dask#10876) seem to have introduced unexpected behavior in Xarray's test suite. This has led to CI failures, specifically in tests related to tokenization.

---------- coverage: platform linux, python 3.12.2-final-0 -----------
Coverage XML written to file coverage.xml

=========================== short test summary info ============================
FAILED xarray/tests/test_dask.py::test_token_identical[obj0-<lambda>1] - AssertionError: assert 'bbd9679bdaf2...d3db65e29a72d' == '6352792990cf...e8004a9055314'
  
  - 6352792990cfe23adb7e8004a9055314
  + bbd9679bdaf284c371cd3db65e29a72d
FAILED xarray/tests/test_dask.py::test_token_identical[obj0-<lambda>2] - AssertionError: assert 'bbd9679bdaf2...d3db65e29a72d' == '6352792990cf...e8004a9055314'
  
  - 6352792990cfe23adb7e8004a9055314
  + bbd9679bdaf284c371cd3db65e29a72d
FAILED xarray/tests/test_dask.py::test_token_identical[obj1-<lambda>1] - AssertionError: assert 'c520b8516da8...0e9e0d02b79d0' == '9e2ab1c44990...6ac737226fa02'
  
  - 9e2ab1c44990adb4fb76ac737226fa02
  + c520b8516da8b6a98c10e9e0d02b79d0
FAILED xarray/tests/test_dask.py::test_token_identical[obj1-<lambda>2] - AssertionError: assert 'c520b8516da8...0e9e0d02b79d0' == '9e2ab1c44990...6ac737226fa02'
  
  - 9e2ab1c44990adb4fb76ac737226fa02
  + c520b8516da8b6a98c10e9e0d02b79d0
= 4 failed, 16293 passed, [628](https://github.com/pydata/xarray/actions/runs/8069874717/job/22045898877#step:9:629) skipped, 90 xfailed, 71 xpassed, 213 warnings in 472.07s (0:07:52) =
Error: Process completed with exit code 1.

previously, the following code snippet would pass, verifying the consistency of tokenization in Xarray objects:

In [1]: import xarray as xr, numpy as np

In [2]: def make_da():
   ...:     da = xr.DataArray(
   ...:         np.ones((10, 20)),
   ...:         dims=["x", "y"],
   ...:         coords={"x": np.arange(10), "y": np.arange(100, 120)},
   ...:         name="a",
   ...:     ).chunk({"x": 4, "y": 5})
   ...:     da.x.attrs["long_name"] = "x"
   ...:     da.attrs["test"] = "test"
   ...:     da.coords["c2"] = 0.5
   ...:     da.coords["ndcoord"] = da.x * 2
   ...:     da.coords["cxy"] = (da.x * da.y).chunk({"x": 4, "y": 5})
   ...: 
   ...:     return da
   ...: 

In [3]: da = make_da()

In [4]: import dask.base

In [5]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False))

In [6]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=True))

In [9]: dask.__version__
Out[9]: '2023.3.0'

However, post-update in Dask version '2024.2.1', the same code fails:

In [55]: 
    ...: def make_da():
    ...:     da = xr.DataArray(
    ...:         np.ones((10, 20)),
    ...:         dims=["x", "y"],
    ...:         coords={"x": np.arange(10), "y": np.arange(100, 120)},
    ...:         name="a",
    ...:     ).chunk({"x": 4, "y": 5})
    ...:     da.x.attrs["long_name"] = "x"
    ...:     da.attrs["test"] = "test"
    ...:     da.coords["c2"] = 0.5
    ...:     da.coords["ndcoord"] = da.x * 2
    ...:     da.coords["cxy"] = (da.x * da.y).chunk({"x": 4, "y": 5})
    ...: 
    ...:     return da
    ...: 

In [56]: da = make_da()
In [57]: assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[57], line 1
----> 1 assert dask.base.tokenize(da) == dask.base.tokenize(da.copy(deep=False))

AssertionError: 

In [58]: dask.base.tokenize(da)
Out[58]: 'bbd9679bdaf284c371cd3db65e29a72d'

In [59]: dask.base.tokenize(da.copy(deep=False))
Out[59]: '6352792990cfe23adb7e8004a9055314'

In [61]: dask.__version__
Out[61]: '2024.2.1'

additionally, a deeper dive into dask.base.normalize_token() across the two Dask versions revealed that the latest version includes additional state or metadata in tokenization that was not present in earlier versions.

  • old version
In [29]: dask.base.normalize_token((type(da), da._variable, da._coords, da._name))
Out[29]: 
('tuple',
 [xarray.core.dataarray.DataArray,
  ('tuple',
   [xarray.core.variable.Variable,
    ('tuple', ['x', 'y']),
    'xarray-<this-array>-14cc91345e4b75c769b9032d473f6f6e',
    ('list', [('tuple', ['test', 'test'])])]),
  ('list',
   [('tuple',
     ['c2',
      ('tuple',
       [xarray.core.variable.Variable,
        ('tuple', []),
        (0.5, dtype('float64')),
        ('list', [])])]),
    ('tuple',
     ['cxy',
      ('tuple',
       [xarray.core.variable.Variable,
        ('tuple', ['x', 'y']),
        'xarray-<this-array>-8e98950eca22c69d304f0a48bc6c2df9',
        ('list', [])])]),
    ('tuple',
     ['ndcoord',
      ('tuple',
       [xarray.core.variable.Variable,
        ('tuple', ['x']),
        'xarray-ndcoord-82411ea5e080aa9b9f554554befc2f39',
        ('list', [])])]),
    ('tuple',
     ['x',
      ('tuple',
       [xarray.core.variable.IndexVariable,
        ('tuple', ['x']),
        ['x',
         ('603944b9792513fa0c686bb494a66d96c667f879',
          dtype('int64'),
          (10,),
          (8,))],
        ('list', [('tuple', ['long_name', 'x'])])])]),
    ('tuple',
     ['y',
      ('tuple',
       [xarray.core.variable.IndexVariable,
        ('tuple', ['y']),
        ['y',
         ('fc411db876ae0f4734dac8b64152d5c6526a537a',
          dtype('int64'),
          (20,),
          (8,))],
        ('list', [])])])]),
  'a'])
  • most recent version
In [44]: dask.base.normalize_token((type(da), da._variable, da._coords, da._name))
Out[44]: 
('tuple',
 [('7b61e7593a274e48', []),
  ('tuple',
   [('215b115b265c420c', []),
    ('tuple', ['x', 'y']),
    'xarray-<this-array>-980383b18aab94069bdb02e9e0956184',
    ('dict', [('tuple', ['test', 'test'])])]),
  ('dict',
   [('tuple',
     ['c2',
      ('tuple',
       [('__seen', 2),
        ('tuple', []),
        ('6825817183edbca7', ['48cb5e118059da42']),
        ('dict', [])])]),
    ('tuple',
     ['cxy',
      ('tuple',
       [('__seen', 2),
        ('tuple', ['x', 'y']),
        'xarray-<this-array>-6babb4e95665a53f34a3e337129d54b5',
        ('dict', [])])]),
    ('tuple',
     ['ndcoord',
      ('tuple',
       [('__seen', 2),
        ('tuple', ['x']),
        'xarray-ndcoord-8636fac37e5e6f4401eab2aef399f402',
        ('dict', [])])]),
    ('tuple',
     ['x',
      ('tuple',
       [('abc1995cae8530ae', []),
        ('tuple', ['x']),
        ['x', ('99b2df4006e7d28a', ['04673d65c892b5ba'])],
        ('dict', [('tuple', ['long_name', 'x'])])])]),
    ('tuple',
     ['y',
      ('tuple',
       [('__seen', 25),
        ('tuple', ['y']),
        ['y', ('88974ea603e15c49', ['a6c0f2053e85c87e'])],
        ('dict', [])])])]),
  'a'])

Cc @dcherian / @crusaderky for visibility

@andersy005 andersy005 added needs triage Issue that has not been reviewed by xarray team member CI Continuous Integration tools topic-dask and removed needs triage Issue that has not been reviewed by xarray team member labels Feb 27, 2024
@crusaderky crusaderky self-assigned this Feb 29, 2024
@crusaderky
Copy link
Contributor

This is a resurgence of #6970. Unsure why the previous dask version could be green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools topic-dask
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants