Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of Dask for all data manipulations #312

Merged
merged 61 commits into from
Jan 15, 2025

Conversation

davidhassell
Copy link
Contributor

@davidhassell davidhassell commented Dec 5, 2024

Fixes #317

Implements Dask for all storage underlying cfdm.Data objects, by porting the corresponding code from cf-python.

Aims to not change the API, other than as required for the new functionality.

@davidhassell davidhassell added the enhancement New feature or request label Dec 5, 2024
@davidhassell davidhassell added this to the Next release milestone Dec 5, 2024
@davidhassell
Copy link
Contributor Author

Note that the new cfdm/data/data.py has almost nothing in common with its predecessor!

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am satisfied the functionality is all good here, and is preserved and inherited fine downstream in cf-python when using both this branch and the corresponding branch with removal from cf, NCAS-CMS/cf-python#836.

As usual I've raised some minor comments, mostly where docstrings explicitly reference the cf module which needs updating to {{package}} (else cfdm) now.

I was surprised we didn't need any packaging updates, but I double-checked and it seems we already added dask as a dependency to cfdm so that's sorted, and from running the command python -c "from setuptools import find_packages; from pprint import pprint; pprint(find_packages())" it looks like, despite many new data modules including a new __init__.py, there is nothing to add or remove from the setup.py packages key, so the setup.py doesn't need updating, as you likely determined yourself.

Overall all good, so once you've considered the minor comments in-line, feel free to merge. (Note I've been reviewing NCAS-CMS/cf-python#836 in cross-reference but still have some further checks to do on that side, so will submit the review for that shortly but not right away.)

cfdm/data/dask_utils.py Outdated Show resolved Hide resolved
cfdm/data/dask_utils.py Outdated Show resolved Hide resolved
cfdm/data/dask_utils.py Outdated Show resolved Hide resolved
Changelog.rst Outdated Show resolved Hide resolved
cfdm/data/mixin/arraymixin.py Outdated Show resolved Hide resolved
cfdm/data/utils.py Outdated Show resolved Hide resolved
cfdm/data/utils.py Outdated Show resolved Hide resolved
cfdm/data/utils.py Outdated Show resolved Hide resolved
cfdm/data/utils.py Outdated Show resolved Hide resolved
cfdm/data/data.py Outdated Show resolved Hide resolved
davidhassell and others added 5 commits January 2, 2025 13:39
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Co-authored-by: Sadie L. Bartholomew <[email protected]>
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the (very) minor feedback items. Did a final sanity check and everything's good so feel free to merge (or I'll be doing the final review on the corresponding cf-python PR later today if you want to wait on that to merge them together).

@davidhassell
Copy link
Contributor Author

Great - thanks, Sadie!

I'll hold off merging until the corresponding cf-python PR is also ready to merge, in case anything arising over there implies a change over here.

@davidhassell
Copy link
Contributor Author

Following on from our off-line discussion on Data._binary_operation being a @classmethod, it occurs to me that we only need to comment on this in cfdm, and not also in cf-python, because cf-python is merely calling overloading it from an external library. Does that make sense?

    @classmethod
    def _binary_operation(cls, data, other, method):
        """Binary arithmetic and comparison operations.
        <snip>
        """
        # Note: This method is a classmethod to allow it's
        #       functionality to be used with a LHS operand that is
        #       not 'self'. This is not currently needed here, but
        #       could be useful in subclasses which overload this
        #       method, yet still want to access the parent method
        #       functionality via `super`. For instance, cf-python's
        #       `cf.Data._binary_operation` sometimes needs to apply
        #       the binary operation to a modified copy of its 'self'.
        inplace = method[2] == "i"

@sadielbartholomew
Copy link
Member

Does that make sense?

Does to me! Thanks, all good with that.

@davidhassell
Copy link
Contributor Author

Great. I'll add it in ...

Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing and discussing my concerns from the cf-python PR. I am happy with the explanatory comment here (one typo). Please merge (both PRs) once you feel ready!

cfdm/data/data.py Outdated Show resolved Hide resolved
davidhassell and others added 2 commits January 10, 2025 15:34
Co-authored-by: Sadie L. Bartholomew <[email protected]>
@davidhassell
Copy link
Contributor Author

Merging - thanks for the review, Sadie!

@davidhassell davidhassell merged commit 170ca84 into NCAS-CMS:main Jan 15, 2025
@davidhassell davidhassell deleted the dask2 branch January 15, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduction of Dask for all data manipulations
2 participants