Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MDIO Schemas and Update Documentation #311

Merged
merged 125 commits into from
Jan 19, 2024
Merged

Add MDIO Schemas and Update Documentation #311

merged 125 commits into from
Jan 19, 2024

Conversation

tasansal
Copy link
Collaborator

@tasansal tasansal commented Nov 29, 2023

  • Added base schemas for MDIO v0 and v1 variables, datasets, units etc.
  • Significantly improved documentation (still some work to do)
  • Switched from a huge stack of flake8+black+isort+plugins to ruff
  • With new ruff rules
    • Everything is type hinted now (still need to run mypy and fix errors though)
    • Fixed a lot of ugly code
    • Found some bugs and fixed them
  • Updated almost all dependencies to latest versions
  • Things should be a little faster because removed a lot of dict/list etc calls to literals
  • Moved a lot of type hint imports to type-checking block, this should also speed up imports and help avoid circular dependencies
  • Improved some testing logic.

NOTE: Ignoring coverage right now because a lot of code was added. Before v1 release, will have a separate branch to add more tests.

@tasansal tasansal added documentation Improvements or additions to documentation enhancement New feature or request refactoring Refactoring labels Nov 29, 2023
@tasansal tasansal self-assigned this Nov 29, 2023
@tasansal tasansal added v1 and removed refactoring Refactoring labels Dec 5, 2023
@tasansal tasansal force-pushed the v1-new-schema branch 2 times, most recently from a30a60c to a638249 Compare January 5, 2024 16:41
Introduced new Pydantic models pertaining to different compression strategies like ZFP and Blosc. Added several schema files to support dimensions, datatypes, units, and metadata. In particular, these schemas allow parameterizing and validating different components such as numeric types, compression parameters, dimensions, and units.
The pyproject.toml has been updated by adding 'pydantic' and 'pint' dependencies. The added dependencies are expected to improve data validation and unit conversion respectively.
The commit introduces two new files: 'stats.py' for both v0 and v1 schemas. In the v0, the file contains two models: 'Statistics' and 'StatisticsMetadata', while in the v1, 'SummaryStatistics', 'SummaryStatisticsMetadata', and 'Histogram' models are created. These models define the basic statistical indicators and related metadata for MDIO framework.
A new class, UserAttributes has been created within the metadata module. This class essentially captures user-defined attributes as key-value pairs using the Pydantic framework. It helps in leveraging the control over adding custom attributes in metadata instances.
The Dimension class in the mdio schema has been updated to not only include 'name' and 'size' fields with their respective descriptions, but also to introduce a new optional field 'chunksize'. This update enhances the data model, providing a clearer definition of a Dimension object and allowing for chunking specification.
The code modification imports classes CoordinateUnits and Units from the units module into the MDIO v1 schema. The '__all__' variable has been updated to expose these imported classes, making them publicly available for other modules.
The base schema's import paths for `DataType`, `StructuredDataType`, `UserAttributes`, and `Dimension` have been updated. This change involves updating the original source from `numeric` to `scalar` and adding `metadata` and `dimension` paths.
This commit introduces classes LabeledArray, Coordinate, and Variable to manage different types of arrays within the mdio.schemas.v1 package. Each class includes relevant properties, dimensions, and metadata attributes, supporting the MDIO format requirements.
The import statement for SummaryStatisticsMetadata in variable.py has been modified to be more direct. Additionally, SummaryStatisticsMetadata has been added to the __all__ list in __init__.py, ensuring it is properly exposed for import.
A new module is added that implements the core components of the MDIO schemas. The module includes a BaseModel subclass with Pascal Cased aliases, using the pydantic library.
The commit focuses on restructuring MDIO schemas. Certain components have been replaced - where earlier implementations focused on summary statistics metadata and coordinate units, the core component is now the Dataset class across v0 and v1 schemas.
The commit introduces BaseArray and NamedArray classes in mdio/schemas/base/array.py file. This change represents a move towards a more modular and maintainable code structure, enabling more efficient schema updates in the future.
The Blosc class in mdio/schemas/base/blosc.py has been refactored to extend from StrictCamelBaseModel instead of BaseModel from pydantic. This allows a more strict enforcement of camel case for attribute names and aids in code maintainability.
Dimension class now extends from StrictCamelBaseModel instead of BaseModel. This change allows enforcing more strict camel-case convention for attribute names. Additionally, TypeAlias has been used for DimensionCollection, DimensionReference and DimensionContext to improve code readability and maintainability.
This function used to validate metadata key conventions but is no longer needed. Along with the function, its extensive documentation has also been removed to improve code readability and maintainability.
Array-related classes, including `BaseArray` and `NamedArray`, were moved from `array.py` to `base.py` for better code organization. Corresponding import paths were also altered in `__init__.py` and `base.py`. This restructuring enhances the modularity and readability of the code.
The classes ZFP, Blosc, and NamedDimension have been moved to respective submodules 'compressors' and 'dimension'. This commit updates their import paths in the base.py, making the code more structured and modular.
The import statements for NamedArray, ScalarType, and StructuredType in variable.py files (v0 and v1) were updated to reflect their new locations. The code in dataset.py was also updated. These changes enhance modularity and readability of the code.
The `StrictCamelBaseModel` class has been moved from base.py to core.py to enhance modularity. All import statements across various files which were using this BaseModel subclass from the `base` module are now updated to import it from the `core` module.
Removed outdated files, updated DatasetModelV0, added CompressorMetadata, revised base model naming, updated core model imports, integrated Pydantic intersphinx mapping, and separated StrictCamelBaseModel into two classes for clarity and efficiency.
The 'coordinates' field in the Variable class of variable.py (src/mdio/schemas/v1/) file is updated. It now accepts both a list of Coordinate objects and a list of strings, providing greater flexibility in input types.
The Dataset modeling in MDIO schema is refactored and improved. A new DatasetInfo class is created to hold the dataset's fundamental information. The current Dataset class's fields are updated, including replacing UserAttributes with the new DatasetInfo class for dataset metadata field. This makes the management of dataset information more efficient and logical.
The DataSetMetadata class has been introduced and incorporated into the MDIO schema v1 documentation. This class is added to the autosummary and as an autopydantic_model under the 'DataSet' dropdown, streamlining and improving the data management of DataSet. The documentation structure is also updated, separating components like 'Variable', 'Units', 'Stats', and 'Enums' into individual dropdown sections for better readability and clarity.
…dMetadataConvention class

The model_dump and model_dump_json methods were removed from the VersionedMetadataConvention class in mdio/schemas/metadata.py. These methods were overriding the default model dumping to use alias. However, it was deemed unnecessary, thus, improving simplicity and reducing complexity.
The dependencies in the pyproject.toml file have been updated to their latest versions. This includes changes to the versions of dask, numba, psutil, fsspec, pydantic, distributed, among others. Therefore, this update ensures compatibility with the latest packages and enhances the overall functionality.
The code changes replace the specific version of myst-nb from a GitHub repository with the stable version of myst-nb available for public use. This means we no longer need the previous workaround, as the issue with Sphinx 7 has been resolved.
The version of the multidimio project has been updated from 0.5.3 to 1.0.0-alpha.1. This version update indicates the evolution of the project from a beta version to an alpha version, suggesting new features or significant changes have been introduced.
Updated the versions of 'dask' and 'distributed' dependencies. Several linting and formatting tools have been replaced with 'ruff', a single unified tool that covers similar functionality. Additionally, now 'ruff' configurations are defined in pyproject.toml.
…ate unit test.

This commit refactors the dimension checking logic in the 'coord_to_index' function to improve error handling, raising a KeyError for non-existent dimensions. Indentations for method parameter listings have been adjusted for better readability. The 'test_wrong_index' unit test was updated correspondingly to expect a KeyError instead of ValueError.
This commit updates the versions of several dependencies in the pyproject.toml file. Specifically, it increases the versions of 'psutil', 'ruff', and modifies the version constraints for 'furo' and 'sphinx-autobuild' to start with a carat (^), indicating compatibility with versions equal to or higher than the specified version.
The pre-commit configuration file has been updated to only include python and pyi types for the ruff-format and ruff hooks. Removed jupyter as a type to match the file types the repository mainly deals with and that are to be checked by these hooks.
@tasansal tasansal merged commit 488fc31 into v1 Jan 19, 2024
18 of 20 checks passed
@tasansal tasansal deleted the v1-new-schema branch January 19, 2024 23:40
tasansal added a commit that referenced this pull request Mar 8, 2024
Added base schemas for MDIO v0 and v1 variables, datasets, units etc.
Significantly improved documentation (still some work to do)
Switched from a huge stack of flake8+black+isort+plugins to ruff
With new ruff rules
Everything is type hinted now (still need to run mypy and fix errors though)
Fixed a lot of ugly code
Found some bugs and fixed them
Updated almost all dependencies to latest versions
Things should be a little faster because removed a lot of dict/list etc calls to literals
Moved a lot of type hint imports to type-checking block, this should also speed up imports and help avoid circular dependencies
Improved some testing logic.
NOTE: Ignoring coverage right now because a lot of code was added. Before v1 release, will have a separate branch to add more tests.
tasansal added a commit that referenced this pull request Mar 8, 2024
Added base schemas for MDIO v0 and v1 variables, datasets, units etc.
Significantly improved documentation (still some work to do)
Switched from a huge stack of flake8+black+isort+plugins to ruff
With new ruff rules
Everything is type hinted now (still need to run mypy and fix errors though)
Fixed a lot of ugly code
Found some bugs and fixed them
Updated almost all dependencies to latest versions
Things should be a little faster because removed a lot of dict/list etc calls to literals
Moved a lot of type hint imports to type-checking block, this should also speed up imports and help avoid circular dependencies
Improved some testing logic.
NOTE: Ignoring coverage right now because a lot of code was added. Before v1 release, will have a separate branch to add more tests.
tasansal added a commit that referenced this pull request Apr 10, 2024
Added base schemas for MDIO v0 and v1 variables, datasets, units etc.
Significantly improved documentation (still some work to do)
Switched from a huge stack of flake8+black+isort+plugins to ruff
With new ruff rules
Everything is type hinted now (still need to run mypy and fix errors though)
Fixed a lot of ugly code
Found some bugs and fixed them
Updated almost all dependencies to latest versions
Things should be a little faster because removed a lot of dict/list etc calls to literals
Moved a lot of type hint imports to type-checking block, this should also speed up imports and help avoid circular dependencies
Improved some testing logic.
NOTE: Ignoring coverage right now because a lot of code was added. Before v1 release, will have a separate branch to add more tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant