Updated hash key generation to handle missing encoding dtype. #167

david-sh-csiro · 2024-12-12T02:23:18Z

Updated hash key generation and added tests to validate the fix.

…added tests to validate fix and updated the development change notes.

mx-moth

This issue seems to be due to pydata/xarray#2436

docs/releases/development.rst

mx-moth · 2024-12-12T03:35:59Z

src/emsarray/conventions/_base.py

@@ -1973,7 +1973,7 @@ def hash_geometry(self, hash: "hashlib._Hash") -> None:
            # Include the dtype of the data array.
            # A float array and an int array mean very different things,
            # but could have identical byte patterns.
-            hash_string(hash, data_array.encoding['dtype'].name)
+            hash_string(hash, data_array.encoding.get('dtype', data_array.values.dtype).name)


The dtype encoding value should be present most of the time. It being missing is an unexpected exception to the norm. Without the context of this bug checking data_array.values.dtype looks unnecessary.

Please add brief one or two line description of why we are doing this extra step, and include a link to the xarray bug report tracking this issue: pydata/xarray#2436

I've added a comment with issue link and a simple description.

mx-moth · 2024-12-12T03:37:06Z

tests/operations/test_cache.py

@@ -200,3 +200,25 @@ def test_cache_key_cfgrid1d_sha1(datasets: pathlib.Path):
    assert result_cache_key_cf is not None

    assert result_cache_key_cf == cache_key_hash_cf1d_sha1
+
+
+def test_cache_key_with_missing_data_array_encoding_type(datasets: pathlib.Path):


This test is too specific. What we actually care about is whether datasets opened via xarray.open_mfdataset() produce a valid geometry hash and we still don't have a test for that. Please update this test to ensure that a dataset opened via xarray.open_mfdataset() produces a valid hash. This means the test will continue to pass even if xarray fix the issue with data_array.encoding being empty, and will fail if some other issue with xarray.open_mfdataset() pops up.

I've added multifile dataset tests. I've included both ugrid and cfgrid specific tests. The cfgrid fixture does lose the encoding as expected, but the test should handle both cases for if and when xarray fixes the issue.

… cache key generation on mfdatasets.

mx-moth

Looks good!

Updated hash key generation to handling missing encoding dtype. Also …

e4f4218

…added tests to validate fix and updated the development change notes.

david-sh-csiro linked an issue Dec 12, 2024 that may be closed by this pull request

Multifile Datasets don't work with cache key generation #166

Closed

david-sh-csiro requested a review from mx-moth December 12, 2024 02:23

mx-moth requested changes Dec 12, 2024

View reviewed changes

Added comment to describe xarray bug mitigation.

3b85b89

david-sh-csiro changed the title ~~Updated hash key generation to handling missing encoding dtype.~~ Updated hash key generation to handle missing encoding dtype. Dec 12, 2024

david-sh-csiro added 4 commits December 12, 2024 17:14

Added multifile dataset caching test.

04b0cab

Added seperate unit test for ugrid and cfgrid conventions for testing…

2b18684

… cache key generation on mfdatasets.

Updated development rst to include correct issue and pr.

f61c8a9

Added fixtures for multifile datasets.

38926af

mx-moth approved these changes Jan 8, 2025

View reviewed changes

david-sh-csiro merged commit 0579a7c into main Jan 21, 2025
15 checks passed

david-sh-csiro deleted the 166-multifile-datasets-dont-work-with-cache-key-generation branch January 21, 2025 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated hash key generation to handle missing encoding dtype. #167

Updated hash key generation to handle missing encoding dtype. #167

david-sh-csiro commented Dec 12, 2024

mx-moth left a comment

mx-moth Dec 12, 2024

david-sh-csiro Dec 12, 2024

mx-moth Dec 12, 2024

david-sh-csiro Dec 12, 2024 •

edited

Loading

mx-moth left a comment

Updated hash key generation to handle missing encoding dtype. #167

Updated hash key generation to handle missing encoding dtype. #167

Conversation

david-sh-csiro commented Dec 12, 2024

mx-moth left a comment

Choose a reason for hiding this comment

mx-moth Dec 12, 2024

Choose a reason for hiding this comment

david-sh-csiro Dec 12, 2024

Choose a reason for hiding this comment

mx-moth Dec 12, 2024

Choose a reason for hiding this comment

david-sh-csiro Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

mx-moth left a comment

Choose a reason for hiding this comment

david-sh-csiro Dec 12, 2024 •

edited

Loading