-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multidim rechunking _invert_meshgrid error #658
Conversation
Thanks so much for opening this @tom-the-hill, this is a great start towards closing #520.
This seems to suggest that (at least in certain cases) the checks made by Do you feel like you have a sense of what
|
I can't say I fully grasp what My best guess for what this portion of def _invert_meshgrid(*arrays):
"""Inverts the numpy.meshgrid function."""
ndim = len(arrays)
shape = arrays[0].shape
assert all(a.shape == shape for a in arrays)
selectors = [ndim * [0] for n in range(ndim)]
for n in range(ndim):
selectors[n][ndim - n - 1] = slice(None)
selectors[n] = tuple(selectors[n])
xi = [a[s] for a, s in zip(arrays, selectors)]
assert all(
np.equal(actual, expected).all() for actual, expected in zip(arrays, np.meshgrid(*xi))
)
return xi With just brute force testing heres a few examples of the parameters where it passes and fails: Columns:
(full size of time, [time size of each split input]), {chunking dimensions} , {size of other non-time dimensions of input dataset}
[((4, [1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [2]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [2]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [2]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [2]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [1, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [1, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [1, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [1, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [2, 1, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [2, 1, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [2, 1, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [2, 1, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((5, [2, 1, 2]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((5, [2, 1, 2]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((5, [2, 1, 2]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((5, [2, 1, 2]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((5, [2, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((5, [2, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((5, [2, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((5, [2, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3})] Pass for <=3D chunking: Columns:
(full size of time, [time size of each split input]), {chunking dimensions} , {size of other non-time dimensions of input dataset}
[((4, [2]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [2]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((4, [2]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1}, {'ny': 4, 'nx': 4, 'nz': 3}),
((4, [2]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1}, {'ny': 5, 'nx': 5, 'nz': 3}),
((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1},{'ny': 4, 'nx': 4, 'nz': 3}),
((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1},{'ny': 5, 'nx': 5, 'nz': 3})] |
Thanks so much @tom-the-hill. I talked through this bug with @rabernat today. I've got a bit more context now, so I'll have a go at pushing some contributions to your PR here, if that's alright. |
Sounds good to me, thanks for the help on this. |
@cisaacstern and I just came up with a fix for pangeo-forge/pangeo-forge-recipes#658 and I want to see if this alleviates some of the failures I have seen.
@jbusecke and I took a very deep dive on this today! (With thanks to @norlandrhagen for participating in the preamble earlier this week.) We have tentatively concluded that adding Julius is going to run this branch against some of his production cases that were triggering this error as a further gut check. The proliferation of comments committed here so far reflect our notes as we were going along, and trying to understand how these functions actually work. We've left these in here for now, because as part of this fix, we would also like to do some renaming/refactoring as part of this work to make this area of the code more easily understandable and maintainable. Remaining TODO:
|
* Test fix for multidim rechunking _invert_meshgrid error @cisaacstern and I just came up with a fix for pangeo-forge/pangeo-forge-recipes#658 and I want to see if this alleviates some of the failures I have seen. * Update iids_pr.yaml * Try out MPI models that failed before. * Update iids_pr.yaml * Update iids.yaml * Update recipe.py
Superseded by #708. Thanks all for your help on this! |
Hello, I'd like to suggest adding the following test to test_rechunking.py. The purpose of the the additional test cases is to test when different amounts of dimensions are passed as "target_chunks" to split_fragment and subsequently combined. Particularly when 3 or more chunking dimensions are used as target_chunks, several cases will fail with the same error: "ValueError: Cannot combine fragments because they do not form a regular hypercube.". @cisaacstern confirmed that these cases aren't covered elsewhere. I've attached the report from running the tests.
I tried testing these cases with the failing portion in combine_fragments commented out,
and all of the tests passed. I also tested in a full end-to-end case and the output was identical to the input datasets.
Test report output: