Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zep-6): Implement ZOM dataclasses for V2 and V3 objects #1526

Closed
wants to merge 3 commits into from

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Sep 19, 2023

This (experimental) PR adds a set of Zarr Object Models in the form of datalcasses. These models are intended to be independent representations of the core Zarr classes (e.g. Array), allowing for creation of metadata-only hierarchies (groups and arrays). These models are intended to be used internally to represent the metadata of groups and arrays.

See also: zarr-developers/zeps#46

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

cc @d-v-b

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 19, 2023
@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #1526 (72d9751) into main (6ec746e) will decrease coverage by 0.16%.
The diff coverage is 91.24%.

Additional details and impacted files
@@             Coverage Diff             @@
##              main    #1526      +/-   ##
===========================================
- Coverage   100.00%   99.84%   -0.16%     
===========================================
  Files           37       39       +2     
  Lines        14735    15009     +274     
===========================================
+ Hits         14735    14985     +250     
- Misses           0       24      +24     
Files Changed Coverage Δ
zarr/tests/test_zoms.py 89.86% <89.86%> (ø)
zarr/zoms.py 92.85% <92.85%> (ø)

@d-v-b
Copy link
Contributor

d-v-b commented Sep 21, 2023

this is exciting! what do you think of defining class methods on these for serializing to / from storage? Over in pydantic zarr I used the names to_zarr and from_zarr but I don't think that's a very good fit here, maybe to_store and from_store?

@jhamman
Copy link
Member Author

jhamman commented Sep 21, 2023

this is exciting! what do you think of defining class methods on these for serializing to / from storage? Over in pydantic zarr I used the names to_zarr and from_zarr but I don't think that's a very good fit here, maybe to_store and from_store?

Well, I've already started working on this! Currently named these serialize/deserialize but could change these easily. I'll get an update to this PR pushed today.

@tasansal
Copy link
Contributor

What is the future of pydantic-zarr in this case? Or is the plan to re-implement it here and archive pydantic-zarr?

Maybe I am biased, but I prefer Pydantic over Python native data classes, mainly due to validation. I noticed @jhamman you wrote some of the validation manually in post_init. Pydantic also allows JSON Schema, which can be turned into documentation automatically.

I can help with migrating your work to Pydantic if everyone agrees on that. @d-v-b's work would also is a good starting point. I vote we move to Pydantic 2; which is significantly faster and streamlines some of the validation boilerplate.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 24, 2023

@tasansal there was an offline discussion about bringing pydantic in as a zarr dependency vs implementing the ZOM with dataclasses (or similar functionality). I argued against depending on pydantic, mostly because I don't think we need pydantic to achieve the basic functionality here -- for simply defining some classes and serializing them as JSON, dataclasses are fine, and with dataclasses we don't have to worry about keeping up with pydantic development.

With that in mind, I'm going to continue working on pydantic-zarr. The big to-dos are moving to pydantic v2, and zarr v3.

@tasansal
Copy link
Contributor

@tasansal there was an offline discussion about bringing pydantic in as a zarr dependency vs implementing the ZOM with dataclasses (or similar functionality). I argued against depending on pydantic, mostly because I don't think we need pydantic to achieve the basic functionality here -- for simply defining some classes and serializing them as JSON, dataclasses are fine, and with dataclasses we don't have to worry about keeping up with pydantic development.

With that in mind, I'm going to continue working on pydantic-zarr. The big to-dos are moving to pydantic v2, and zarr v3.

Reasonable. The reason I asked all this is we are refactoring our open-source energy data format MDIO to have version-controlled dataset schemas for the v1 spec and want to be proactive and make it play nice with ZOMs and pydantic-zarr :-) (same naming conventions etc). It will be a higher level abstraction of Zarr specifically for energy industry N-D scientific data.

The ZOM here vs. Pydantic Zarr is confusing to users, though. Pydantic is quite a small dependency if to be added to Zarr for ZOMs in the future maybe.

Cheers!

@d-v-b
Copy link
Contributor

d-v-b commented Oct 24, 2023

The ZOM here vs. Pydantic Zarr is confusing to users, though. Pydantic is quite a small dependency if to be added to Zarr for ZOMs in the future maybe.

Sorry about that. I started pydantic-zarr to solve my own problems, and only later then I realized that some of the central ideas would be better off specified in the zarr specs, which led to the ZOM idea, but I've been slow to push it forward.

Ultimately, I want pydantic-zarr to be "just" a pydantic implementation of the ZOM, which should be completely specified here, and this will be clear in the pydantic-zarr docs. I just need to finish the formal specification.

@jhamman
Copy link
Member Author

jhamman commented Dec 7, 2023

closing this as I'm not likely to return to it in its current state. Ideas here are likely to reappear in v3 in a slightly different form.

@jhamman jhamman closed this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants