-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated hash key generation to handle missing encoding dtype. #167
Updated hash key generation to handle missing encoding dtype. #167
Conversation
…added tests to validate fix and updated the development change notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue seems to be due to pydata/xarray#2436
@@ -1973,7 +1973,7 @@ def hash_geometry(self, hash: "hashlib._Hash") -> None: | |||
# Include the dtype of the data array. | |||
# A float array and an int array mean very different things, | |||
# but could have identical byte patterns. | |||
hash_string(hash, data_array.encoding['dtype'].name) | |||
hash_string(hash, data_array.encoding.get('dtype', data_array.values.dtype).name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtype
encoding value should be present most of the time. It being missing is an unexpected exception to the norm. Without the context of this bug checking data_array.values.dtype
looks unnecessary.
Please add brief one or two line description of why we are doing this extra step, and include a link to the xarray bug report tracking this issue: pydata/xarray#2436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment with issue link and a simple description.
tests/operations/test_cache.py
Outdated
@@ -200,3 +200,25 @@ def test_cache_key_cfgrid1d_sha1(datasets: pathlib.Path): | |||
assert result_cache_key_cf is not None | |||
|
|||
assert result_cache_key_cf == cache_key_hash_cf1d_sha1 | |||
|
|||
|
|||
def test_cache_key_with_missing_data_array_encoding_type(datasets: pathlib.Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is too specific. What we actually care about is whether datasets opened via xarray.open_mfdataset()
produce a valid geometry hash and we still don't have a test for that. Please update this test to ensure that a dataset opened via xarray.open_mfdataset()
produces a valid hash. This means the test will continue to pass even if xarray fix the issue with data_array.encoding
being empty, and will fail if some other issue with xarray.open_mfdataset()
pops up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added multifile dataset tests. I've included both ugrid and cfgrid specific tests. The cfgrid fixture does lose the encoding as expected, but the test should handle both cases for if and when xarray fixes the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Updated hash key generation and added tests to validate the fix.