Fix multidim rechunking _invert_meshgrid error #658

ta-hill · 2023-11-30T21:42:00Z

Hello, I'd like to suggest adding the following test to test_rechunking.py. The purpose of the the additional test cases is to test when different amounts of dimensions are passed as "target_chunks" to split_fragment and subsequently combined. Particularly when 3 or more chunking dimensions are used as target_chunks, several cases will fail with the same error: "ValueError: Cannot combine fragments because they do not form a regular hypercube.". @cisaacstern confirmed that these cases aren't covered elsewhere. I've attached the report from running the tests.
I tried testing these cases with the failing portion in combine_fragments commented out,

    starts_cube = [np.array(item[1]).reshape(shape) for item in dims_starts_sizes]
    sizes_cube = [np.array(item[2]).reshape(shape) for item in dims_starts_sizes]
    try:
        # reversing order is necessary here because _sort_by_speed_of_varying puts the
        # arrays into the opposite order as wanted by np.meshgrid
        starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
        sizes = _invert_meshgrid(*sizes_cube[::-1])[::-1]
    except AssertionError:
        raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")


    expected_sizes = [np.diff(s) for s in starts]
    if not all(np.equal(s[:-1], es).all() for s, es in zip(sizes, expected_sizes)):
        raise ValueError(f"Dataset {sizes} and index starts {starts} are not consistent.")

and all of the tests passed. I also tested in a full end-to-end case and the output was identical to the input datasets.

Test report output:

test_rechunking.py ..............F.FF                                                                                                                                                                  [100%]

================================================================================================== FAILURES ==================================================================================================
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam0] ________________________________________________________________
../pangeo_forge_recipes/rechunking.py:222: in combine_fragments
    starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
../pangeo_forge_recipes/rechunking.py:147: in _invert_meshgrid
    assert all(
E   AssertionError

During handling of the above exception, another exception occurred:
test_rechunking.py:72: in test_split_and_combine_fragments_with_merge_dim
    _, ds_combined = combine_fragments(
../pangeo_forge_recipes/rechunking.py:225: in combine_fragments
    raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")
E   ValueError: Cannot combine fragments because they do not form a regular hypercube.
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 2 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam0] ________________________________________________________________
../pangeo_forge_recipes/rechunking.py:222: in combine_fragments
    starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
../pangeo_forge_recipes/rechunking.py:147: in _invert_meshgrid
    assert all(
E   AssertionError

During handling of the above exception, another exception occurred:
test_rechunking.py:72: in test_split_and_combine_fragments_with_merge_dim
    _, ds_combined = combine_fragments(
../pangeo_forge_recipes/rechunking.py:225: in combine_fragments
    raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")
E   ValueError: Cannot combine fragments because they do not form a regular hypercube.
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 5 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam1] ________________________________________________________________
../pangeo_forge_recipes/rechunking.py:222: in combine_fragments
    starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
../pangeo_forge_recipes/rechunking.py:147: in _invert_meshgrid
    assert all(
E   AssertionError

During handling of the above exception, another exception occurred:
test_rechunking.py:72: in test_split_and_combine_fragments_with_merge_dim
    _, ds_combined = combine_fragments(
../pangeo_forge_recipes/rechunking.py:225: in combine_fragments
    raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")
E   ValueError: Cannot combine fragments because they do not form a regular hypercube.
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 5 other_chunks ={'lat': 5, 'lon': 5}
=================================================================================================== PASSES ===================================================================================================
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 1 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 1 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 2 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 2 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 5 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 5 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 1 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 1 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 2 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 2 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 5 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 5 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 1 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 1 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 2 other_chunks ={'lat': 5, 'lon': 5}
========================================================================================== short test summary info ===========================================================================================
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam1]
FAILED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam0] - ValueError: Cannot combine fragments because they do not form a regular hypercube.
FAILED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam0] - ValueError: Cannot combine fragments because they do not form a regular hypercube.
FAILED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam1] - ValueError: Cannot combine fragments because they do not form a regular hypercube.
========================================================================== 3 failed, 15 passed, 37 deselected, 2 warnings in 5.40s ===========================================================================

cisaacstern · 2023-12-01T00:48:55Z

Thanks so much for opening this @tom-the-hill, this is a great start towards closing #520.

I tried testing these cases with the failing portion in combine_fragments commented out, ... and all of the tests passed.

This seems to suggest that (at least in certain cases) the checks made by _invert_meshgrid are not needed.

Do you feel like you have a sense of what _invert_meshgrid is checking for? I'm asking because:

I don't understand it! 😅
Maybe finding a way to express what this check is doing in plain language can help us understand if/why it may be unneeded in this case.

ta-hill · 2023-12-01T21:12:27Z

I can't say I fully grasp what _invert_meshgrid is checking for, but based on the statement fragments do not form a regular hypercube and the test cases where it does actaully pass the check I think I have a general sense. The tests seem to pass for any cases where a maximum of 2 dimensions are being chunked. It only passes for 3 and 4 dimensions when the concatenating dimension ( usually time ) of all fragments in the group are evenly divided by the target chunk size of time. It doesn't seem that the chunk sizes of the other dimensions have any effect, only that they are there and not equal to 1.

My best guess for what this portion of _invert_meshgrid is actually checking is possibly related to https://math.stackexchange.com/questions/3755509/inverse-of-block-anti-diagonal-matrix.
and checking for symmetry across rotation/reflection. Maybe someone a little more mathematically inclined can give more insight.

def _invert_meshgrid(*arrays):
    """Inverts the numpy.meshgrid function."""

    ndim = len(arrays)
    shape = arrays[0].shape
    assert all(a.shape == shape for a in arrays)
    selectors = [ndim * [0] for n in range(ndim)]
    for n in range(ndim):
        selectors[n][ndim - n - 1] = slice(None)
        selectors[n] = tuple(selectors[n])
    xi = [a[s] for a, s in zip(arrays, selectors)]
    assert all(
        np.equal(actual, expected).all() for actual, expected in zip(arrays, np.meshgrid(*xi))
    )
    return xi

With just brute force testing heres a few examples of the parameters where it passes and fails:
Pass for 2D Chunking:

Columns: 
(full size of time, [time size of each split input]), {chunking dimensions} , {size of other non-time dimensions of input dataset}
[((4, [1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3})]

Pass for <=3D chunking:

Columns:
(full size of time, [time size of each split input]), {chunking dimensions} , {size of other non-time dimensions of input dataset}
[((4, [2]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1},{'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1},{'ny': 5, 'nx': 5, 'nz': 3})]

cisaacstern · 2023-12-01T21:42:32Z

Thanks so much @tom-the-hill. I talked through this bug with @rabernat today. I've got a bit more context now, so I'll have a go at pushing some contributions to your PR here, if that's alright.

ta-hill · 2023-12-01T21:46:08Z

Sounds good to me, thanks for the help on this.

…hunking

@cisaacstern

@cisaacstern and I just came up with a fix for pangeo-forge/pangeo-forge-recipes#658 and I want to see if this alleviates some of the failures I have seen.

cisaacstern · 2024-02-10T00:08:14Z

@jbusecke and I took a very deep dive on this today! (With thanks to @norlandrhagen for participating in the preamble earlier this week.) We have tentatively concluded that adding .squeeze() here is "all" that is needed to resolve this issue.

Julius is going to run this branch against some of his production cases that were triggering this error as a further gut check.

The proliferation of comments committed here so far reflect our notes as we were going along, and trying to understand how these functions actually work. We've left these in here for now, because as part of this fix, we would also like to do some renaming/refactoring as part of this work to make this area of the code more easily understandable and maintainable.

Remaining TODO:

Continue refactoring and unit testing of combine_fragments sub-routines (including _invert_meshgrid itself!). The funny thing is, that despite having found the (probable) fix for this through some informed trial-and-error, Julius and I don't actually fully understand yet how this section of the code is meant to behave. Cleaner factoring, naming, and unit testing, should help make this more readily apparent to all (including ourselves). This can extend to more informative/descriptive typing.
Remove extraneous comments once readability is enhanced by refactor/ unit testing/ better docstrings/names/types.

@cisaacstern

* Test fix for multidim rechunking _invert_meshgrid error @cisaacstern and I just came up with a fix for pangeo-forge/pangeo-forge-recipes#658 and I want to see if this alleviates some of the failures I have seen. * Update iids_pr.yaml * Try out MPI models that failed before. * Update iids_pr.yaml * Update iids.yaml * Update recipe.py

cisaacstern · 2024-03-15T16:37:55Z

Superseded by #708. Thanks all for your help on this!

tom-a-hill1 and others added 2 commits November 29, 2023 16:13

added more test cases to test_split_and_combine_fragments_with_merge_dim

20b9fb4

Merge branch 'pangeo-forge:main' into test-for-multidim-rechunking

50f8f62

cisaacstern mentioned this pull request Jan 24, 2024

Create MRE for rechunking failure leap-stc/cmip6-leap-feedstock#8

Closed

1 task

cisaacstern added 5 commits February 7, 2024 11:25

Merge remote-tracking branch 'origin/main' into test-for-multidim-rec…

7aa1932

…hunking

comments + squeeze fix

2444d7a

test multidim rechunk in end to end

0bc77bb

add descriptive names for multidim rechunk test params

9e5736e

factor out _check_fragment_consistency

c17dd33

cisaacstern changed the title ~~Test for multidim rechunking~~ Fix multidim rechunking _invert_meshgrid error Feb 10, 2024

jbusecke mentioned this pull request Feb 10, 2024

Test fix for multidim rechunking _invert_meshgrid error leap-stc/cmip6-leap-feedstock#95

Merged

1 task

cisaacstern mentioned this pull request Mar 15, 2024

Minimal multidim rechunk fix #708

Merged

cisaacstern closed this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multidim rechunking _invert_meshgrid error #658

Fix multidim rechunking _invert_meshgrid error #658

ta-hill commented Nov 30, 2023

cisaacstern commented Dec 1, 2023

ta-hill commented Dec 1, 2023

cisaacstern commented Dec 1, 2023

ta-hill commented Dec 1, 2023

cisaacstern commented Feb 10, 2024 •

edited

Loading

cisaacstern commented Mar 15, 2024

Fix multidim rechunking _invert_meshgrid error #658

Fix multidim rechunking _invert_meshgrid error #658

Conversation

ta-hill commented Nov 30, 2023

cisaacstern commented Dec 1, 2023

ta-hill commented Dec 1, 2023

cisaacstern commented Dec 1, 2023

ta-hill commented Dec 1, 2023

cisaacstern commented Feb 10, 2024 • edited Loading

cisaacstern commented Mar 15, 2024

cisaacstern commented Feb 10, 2024 •

edited

Loading