Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MDIO Schemas and Update Documentation #311

Merged
merged 125 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
857f652
Add various schema files and data models
tasansal Nov 9, 2023
46a3ddc
Update dependencies in pyproject.toml
tasansal Nov 9, 2023
dea7870
rename numeric type to scalar type
tasansal Nov 9, 2023
1c9865a
add type alias
tasansal Nov 9, 2023
bc4f87c
Add statistics and their metadata models to MDIO schemas
tasansal Nov 9, 2023
0575ed5
Add UserAttributes class in metadata module
tasansal Nov 9, 2023
17091bb
Extend Dimension data model in mdio schema
tasansal Nov 9, 2023
f55bb4a
Import and expose Units in MDIO v1 schema
tasansal Nov 9, 2023
0c4eebc
Refactor import paths in base schema
tasansal Nov 9, 2023
3e675f4
Add LabeledArray, Coordinate, and Variable classes
tasansal Nov 9, 2023
466b69b
Refactor import of SummaryStatisticsMetadata
tasansal Nov 9, 2023
d6480bb
Add core components of MDIO schemas
tasansal Nov 29, 2023
5ec7b61
Refactor MDIO schemas by switching core components
tasansal Nov 29, 2023
429c202
Add BaseArray and NamedArray classes to MDIO schemas
tasansal Nov 29, 2023
83d3fa1
Refactor Blosc class to use StrictCamelBaseModel
tasansal Nov 29, 2023
62dd68a
Refactor Dimension class in dimension.py
tasansal Nov 29, 2023
af0be44
Refactor mdio metadata schema classes
tasansal Nov 29, 2023
e389a3d
Refactor scalar.py to use StrictCamelBaseModel
tasansal Nov 29, 2023
a1b59c0
Refactor segy.py docstring for grid_overrides
tasansal Nov 29, 2023
ca1ee86
Replace BaseModel with StrictCamelBaseModel in stats.py
tasansal Nov 29, 2023
e21f17e
Replace BaseModel with StrictCamelBaseModel in units.py
tasansal Nov 29, 2023
fd247da
Update base model and clean-up in stats.py and units.py
tasansal Nov 29, 2023
c41cb89
Replace BaseModel with StrictCamelBaseModel in zfp.py
tasansal Nov 29, 2023
d0a0c17
Refactor and simplify mdio variable schema definitions
tasansal Nov 29, 2023
bd5568b
Add autodoc-pydantic and sphinx-design to dependencies
tasansal Nov 29, 2023
c7ed4ec
Update dependency versions in pyproject.toml
tasansal Nov 29, 2023
1d2be1e
Update docs/requirements.txt dependencies
tasansal Nov 29, 2023
35c392f
Update noxfile.py dependencies
tasansal Nov 29, 2023
dc41203
Remove unused imports from base __init__.py
tasansal Nov 29, 2023
fa226d3
Update installation document header
tasansal Nov 29, 2023
7a245c7
Update reference document header
tasansal Nov 29, 2023
106855a
Update Usage document header
tasansal Nov 29, 2023
6cf4707
Add schema documentation and restructure tutorials
tasansal Nov 29, 2023
7849b27
Add new Variable and Dataset schemas in MDIO V0
tasansal Nov 29, 2023
d7318b1
Add new Dataset model in MDIO V1
tasansal Nov 29, 2023
05d5371
Update schemas to reference to new 'dtype' module
tasansal Nov 30, 2023
149644b
Add urllib3 dependency to pyproject.toml
tasansal Nov 30, 2023
64f4f1a
Update usage documentation's command line examples
tasansal Nov 30, 2023
bd719d8
Improve indentation in usage.md command line examples
tasansal Nov 30, 2023
c8d31c1
Correct indentation in usage.md for clarity
tasansal Nov 30, 2023
c2d70c2
Add chunk grid and metadata models to core schemas
tasansal Nov 30, 2023
2789ee6
Update class descriptions and add chunk grid metadata
tasansal Nov 30, 2023
f6f507a
Add article info to tutorial notebooks
tasansal Nov 30, 2023
e6fdbe1
Refactor `model_fields` function for code reusability
tasansal Nov 30, 2023
5b607c4
Refactor VariableMetadata schema in v0 variable.py
tasansal Nov 30, 2023
8d43e17
Add new documentation for chunk grids and update Sphinx extensions
tasansal Nov 30, 2023
356f216
Update documentation in schema versions 0 and 1
tasansal Nov 30, 2023
137d9a4
Rename and reorganize documentation files
tasansal Dec 1, 2023
07aaf86
Update documentation structure and naming
tasansal Dec 1, 2023
a03af9a
Refactor location of Dimension schemas
tasansal Dec 1, 2023
2f79703
Refactor location of Dimension schemas
tasansal Dec 1, 2023
494d3ab
Reorder sections in documentation index
tasansal Dec 1, 2023
a86e5a1
Refactor API reference links in documentation index
tasansal Dec 1, 2023
ad2218e
Remove colon from 'Known Issues' heading in Contributing doc
tasansal Dec 1, 2023
0b1c369
Refactor compressor schemas into single module
tasansal Dec 1, 2023
066d729
Update import path for Compressors in array.py
tasansal Dec 1, 2023
f0e705f
Update import path in data models documentation
tasansal Dec 1, 2023
f64ffed
Disable listing validators in documentation
tasansal Dec 1, 2023
7635df3
Add new compressors documentation
tasansal Dec 1, 2023
afd7692
Update class references in compressor documentation
tasansal Dec 1, 2023
897188c
Refactor import paths for ScalarType, StructuredType
tasansal Dec 1, 2023
9c46d39
Move data types content to separate documentation file
tasansal Dec 1, 2023
55bc6a5
Reorder topics in documentation index file
tasansal Dec 1, 2023
58a41a5
Add documentation to ScalarType enum
tasansal Dec 1, 2023
921bf91
Add new Sphinx configuration for validator members visibility
tasansal Dec 1, 2023
f01ce35
Add Compression auto summary
tasansal Dec 1, 2023
f1c9faa
Add 'StructuredField' to the mdio schemas
tasansal Dec 1, 2023
57bd515
Update documentation for data types
tasansal Dec 1, 2023
872431c
Update MDIO schemas in dimensions documentation
tasansal Dec 1, 2023
e8dfd7e
Update reference to MDIO schemas in data types documentation
tasansal Dec 1, 2023
945f726
Update reference paths in compressors.md documentation
tasansal Dec 1, 2023
0712df0
Refactor chunk grid classes to a new module
tasansal Dec 1, 2023
951bd85
Update documentation to reflect module relocation for chunk grid classes
tasansal Dec 1, 2023
c9787ab
Enhance dimensions.md with author info and section headers
tasansal Dec 1, 2023
7c24658
Add article metadata to compressor and data types docs
tasansal Dec 1, 2023
7a48e15
Reorder sections in MDIO data models document
tasansal Dec 1, 2023
56a9661
Disable 'autodoc_pydantic_field_show_alias'
tasansal Dec 1, 2023
31379ec
Refactor import paths for NamedArray and relocate array.py
tasansal Dec 1, 2023
82ddb4b
Simplify autosummary in dimensions.md documentation
tasansal Dec 1, 2023
6b3c28e
Update and streamline 'version_1.md' data model documentation
tasansal Dec 1, 2023
8cb9ac3
Update sphinx-click and reorganize requirements.txt
tasansal Dec 4, 2023
16b923e
Refactor dependencies and restructure in pyproject.toml
tasansal Dec 4, 2023
75ff872
Update dask and distributed dependencies
tasansal Dec 4, 2023
442421f
Remove unused Compressors and Dimension aliases
tasansal Dec 4, 2023
6f6c7d3
Update import path for Dimension in utilities.py
tasansal Dec 4, 2023
11dff24
Update import path for Dimension in parsers.py
tasansal Dec 4, 2023
4924ad2
Correct import path for Dimension in grid.py
tasansal Dec 4, 2023
1ed1681
Refactor Dimension class and remove CoordinateUnits
tasansal Dec 4, 2023
de2ace2
Update ruamel.yaml.clib version in poetry.lock
tasansal Dec 4, 2023
009e632
Add VoltageUnitModel to data models
tasansal Dec 4, 2023
a4081a8
Add unique decorator to UnitEnum in units.py
tasansal Dec 4, 2023
58a2772
Correct the unit for density and streamline frequency units
tasansal Dec 4, 2023
ccedb7e
Refactor unit model generation in units.py
tasansal Dec 4, 2023
57af368
Update usage of create_unit_model function
tasansal Dec 4, 2023
b771672
Make units_v1 optional in AllUnitModel.
tasansal Dec 4, 2023
b25199d
Refactor unit creation in base units schema.
tasansal Dec 5, 2023
3003c10
Add BaseDataset class and refactor dataset models
tasansal Dec 5, 2023
6f178df
Refactor file system and import path for base/core.py
tasansal Dec 5, 2023
61d94a8
Refactor metadata.py file location and update imports
tasansal Dec 5, 2023
4287b32
Refactor units.py file location and update import paths
tasansal Dec 5, 2023
4d67a78
Remove base/__init__.py file in mdio/schemas
tasansal Dec 5, 2023
f8339c7
Update 'numpy' link in intersphinx_mapping in conf.py
tasansal Dec 5, 2023
2118c85
Refactor schemas, move base and core classes to a new module
tasansal Dec 5, 2023
6ba31bb
Remove unneeded function follows_metadata_key_convention
tasansal Dec 5, 2023
c547872
Refactor array classes to base module
tasansal Dec 5, 2023
400ce83
Refactor import paths in base.py
tasansal Dec 5, 2023
e34efb0
Update import paths in variable.py and dataset.py
tasansal Dec 5, 2023
2c7545c
Refactor StrictCamelBaseModel from base to core module
tasansal Dec 5, 2023
b26ff55
Refactor MDIO schemas and documentation
tasansal Dec 5, 2023
ce87479
Updated 'coordinates' field to accept list of strings
tasansal Dec 12, 2023
6c3d19c
Improved Dataset modeling in MDIO schema v1
tasansal Dec 12, 2023
7d7bdd0
Add DatasetMetadata to MDIO schema in documentation
tasansal Dec 12, 2023
065145d
Remove overriden model_dump and model_dump_json methods from Versione…
tasansal Dec 12, 2023
75e780e
Update dependencies in pyproject.toml
tasansal Jan 5, 2024
0717d8b
Update myst-nb to stable version in noxfile
tasansal Jan 5, 2024
f23e9f9
Bump major version to v1 alpha
tasansal Jan 5, 2024
72073e3
Allow Python 3.12
tasansal Jan 5, 2024
21e70ae
Add missing docs dependencies to noxfile.
tasansal Jan 5, 2024
1a6c6e3
enhance documentation for schemas - part 1
tasansal Jan 18, 2024
a95cbbe
Replace linters with Ruff and update dependencies.
tasansal Jan 19, 2024
6f11016
lint with ruff
tasansal Jan 19, 2024
d4f0d9f
Refactor dimension checking logic, adjust method indentations and upd…
tasansal Jan 19, 2024
5034794
refactor mainly to use pathlib
tasansal Jan 19, 2024
a71e07a
Update several dependencies in pyproject.toml
tasansal Jan 19, 2024
75d3e03
Update pre-commit config types
tasansal Jan 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 9 additions & 27 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.13
hooks:
- id: ruff-format
types_or: [python, pyi]
- id: ruff
types_or: [python, pyi]
args: [--fix]

- repo: local
hooks:
- id: black
name: black
entry: black
language: system
types: [python]
require_serial: true
- id: check-added-large-files
name: Check for added large files
entry: check-added-large-files
Expand Down Expand Up @@ -34,27 +37,6 @@ repos:
language: system
types: [text]
stages: [commit, push, manual]
- id: flake8
name: flake8
entry: flake8
language: system
types: [python]
require_serial: true
args: [--darglint-ignore-regex, .*]
- id: isort
name: isort
entry: isort
require_serial: true
language: system
types_or: [cython, pyi, python]
args: ["--filter-files"]
- id: pyupgrade
name: pyupgrade
description: Automatically upgrade syntax for newer versions.
entry: pyupgrade
language: system
types: [python]
args: [--py38-plus]
- id: trailing-whitespace
name: Trim Trailing Whitespace
entry: trailing-whitespace-fixer
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ different systems.

This should seamlessly enable development for users of [VS Code] on systems with docker installed.

### Known Issues:
### Known Issues

- `git config --global --add safe.directory $(pwd)` might be needed inside the container.

Expand Down
2 changes: 1 addition & 1 deletion docs/reference.md → docs/api_reference.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Reference
# API Reference

## Readers / Writers

Expand Down
30 changes: 15 additions & 15 deletions docs/usage.md → docs/cli_usage.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Usage
# Command-Line Usage

## Ingestion and Export

Expand All @@ -9,19 +9,19 @@ There are many more options, please see the [CLI Reference](#cli-reference).

```shell
$ mdio segy import \
-i path_to_segy_file.segy \
-o path_to_mdio_file.mdio \
-loc 181,185 \
-names inline,crossline
-i path_to_segy_file.segy \
-o path_to_mdio_file.mdio \
-loc 181,185 \
-names inline,crossline
```

To export the same file back to SEG-Y format, the following command
should be executed.

```shell
$ mdio segy export \
-i path_to_mdio_file.mdio \
-o path_to_segy_file.segy
-i path_to_mdio_file.mdio \
-o path_to_segy_file.segy
```

## Cloud Connection Strings
Expand Down Expand Up @@ -78,7 +78,7 @@ checks them. If it is not pre-authenticated, you need to pass `--storage-options
Using UNIX:

```shell
mdio segy import \
$ mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file s3://bucket/prefix/my.mdio
--header-locations 189,193
Expand All @@ -87,8 +87,8 @@ mdio segy import \

Using Windows (note the extra escape characters `\`):

```console
mdio segy import \
```shell
$ mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file s3://bucket/prefix/my.mdio
--header-locations 189,193
Expand All @@ -113,7 +113,7 @@ authentication information to APIs.
Using a service account:

```shell
mdio segy import \
$ mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file gs://bucket/prefix/my.mdio
--header-locations 189,193
Expand All @@ -123,7 +123,7 @@ mdio segy import \
Using browser to populate authentication:

```shell
mdio segy import \
$ mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file gs://bucket/prefix/my.mdio
--header-locations 189,193
Expand All @@ -144,7 +144,7 @@ If ADL is not pre-authenticated, you need to pass `--storage-options`.
`account_key`: Azure Data Lake storage account access key

```shell
mdio segy import \
$ mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file az://bucket/prefix/my.mdio
--header-locations 189,193
Expand Down Expand Up @@ -196,6 +196,6 @@ get information about usage.

```{eval-rst}
.. click:: mdio.__main__:main
:prog: mdio
:nested: full
:prog: mdio
:nested: full
```
50 changes: 48 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,76 @@
"""Sphinx configuration."""

# -- Project information -----------------------------------------------------

project = "MDIO"
author = "TGS"
copyright = "2023, TGS"
copyright = "2023, TGS" # noqa: A001

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx.ext.autosummary",
"sphinxcontrib.autodoc_pydantic",
"sphinx.ext.autosectionlabel",
"sphinx_click",
"sphinx_copybutton",
"myst_nb",
"sphinx_design",
]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = [
"_build",
"Thumbs.db",
"jupyter_execute",
".DS_Store",
"**.ipynb_checkpoints",
]

intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"pydantic": ("https://docs.pydantic.dev/latest/", None),
"zarr": ("https://zarr.readthedocs.io/en/stable/", None),
}

pygments_style = "vs"
pygments_dark_style = "material"

autodoc_typehints = "description"
autodoc_typehints_format = "short"
autodoc_member_order = "groupwise"
autoclass_content = "both"
autoclass_content = "class"
autosectionlabel_prefix_document = True

autodoc_pydantic_field_list_validators = False
autodoc_pydantic_field_swap_name_and_alias = True
autodoc_pydantic_field_show_alias = False
autodoc_pydantic_model_show_config_summary = False
autodoc_pydantic_model_show_validator_summary = False
autodoc_pydantic_model_show_validator_members = False
autodoc_pydantic_model_show_field_summary = False

html_theme = "furo"

myst_number_code_blocks = ["python"]
myst_heading_anchors = 2
myst_words_per_minute = 80
myst_enable_extensions = [
"colon_fence",
"linkify",
"replacements",
"smartquotes",
"attrs_inline",
]

# sphinx-copybutton configurations
Expand Down
154 changes: 154 additions & 0 deletions docs/data_models/chunk_grids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
```{eval-rst}
:tocdepth: 3
```

```{currentModule} mdio.schemas.chunk_grid

```

# Chunk Grid Models

```{article-info}
:author: Altay Sansal
:date: "{sub-ref}`today`"
:read-time: "{sub-ref}`wordcount-minutes` min read"
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

The variables in MDIO data model can represent different types of chunk grids.
These grids are essential for managing multi-dimensional data arrays efficiently.
In this breakdown, we will explore four distinct data models within the MDIO schema,
each serving a specific purpose in data handling and organization.

MDIO implements data models following the guidelines of the Zarr v3 spec and ZEPs:

- [Zarr core specification (version 3)](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)
- [ZEP 1 — Zarr specification version 3](https://zarr.dev/zeps/accepted/ZEP0001.html)
- [ZEP 3 — Variable chunking](https://zarr.dev/zeps/draft/ZEP0003.html)

## Regular Grid

The regular grid models are designed to represent a rectangular and regularly
paced chunk grid.

```{eval-rst}
.. autosummary::
RegularChunkGrid
RegularChunkShape
```

For 1D array with `size = 31`{l=python}, we can divide it into 5 equally sized
chunks. Note that the last chunk will be truncated to match the size of the array.

`{ "name": "regular", "configuration": { "chunkShape": [7] } }`{l=json}

Using the above schema resulting array chunks will look like this:

```bash
←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ↔ 3
┌───────┬───────┬───────┬───────┬───┐
└───────┴───────┴───────┴───────┴───┘
```

For 2D array with shape `rows, cols = (7, 17)`{l=python}, we can divide it into 9
equally sized chunks.

`{ "name": "regular", "configuration": { "chunkShape": [3, 7] } }`{l=json}

Using the above schema, the resulting 2D array chunks will look like below.
Note that the rows and columns are conceptual and visually not to scale.

```bash
←─ 7 ─→ ←─ 7 ─→ ↔ 3
┌───────┬───────┬───┐
│ ╎ ╎ │ ↑
│ ╎ ╎ │ 3
│ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ │ ↑
│ ╎ ╎ │ 3
│ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ │ ↕ 1
└───────┴───────┴───┘
```

## Rectilinear Grid

The [RectilinearChunkGrid](RectilinearChunkGrid) model extends
the concept of chunk grids to accommodate rectangular and irregularly spaced chunks.
This model is useful in data structures where non-uniform chunk sizes are necessary.
[RectilinearChunkShape](RectilinearChunkShape) specifies the chunk sizes for each
dimension as a list allowing for irregular intervals.

```{eval-rst}
.. autosummary::
RectilinearChunkGrid
RectilinearChunkShape
```

:::{note}
It's important to ensure that the sum of the irregular spacings specified
in the `chunkShape` matches the size of the respective array dimension.
:::

For 1D array with `size = 39`{l=python}, we can divide it into 5 irregular sized
chunks.

`{ "name": "rectilinear", "configuration": { "chunkShape": [[10, 7, 5, 7, 10]] } }`{l=json}

Using the above schema resulting array chunks will look like this:

```bash
←── 10 ──→ ←─ 7 ─→ ← 5 → ←─ 7 ─→ ←── 10 ──→
┌──────────┬───────┬─────┬───────┬──────────┐
└──────────┴───────┴─────┴───────┴──────────┘
```

For 2D array with shape `rows, cols = (7, 25)`{l=python}, we can divide it into 12
rectilinear (rectangular bur irregular) chunks. Note that the rows and columns are
conceptual and visually not to scale.

`{ "name": "rectilinear", "configuration": { "chunkShape": [[3, 1, 3], [10, 5, 7, 3]] } }`{l=json}

```bash
←── 10 ──→ ← 5 → ←─ 7 ─→ ↔ 3
┌──────────┬─────┬───────┬───┐
│ ╎ ╎ ╎ │ ↑
│ ╎ ╎ ╎ │ 3
│ ╎ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ ╎ │ ↕ 1
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ ╎ │ ↑
│ ╎ ╎ ╎ │ 3
│ ╎ ╎ ╎ │ ↓
└──────────┴─────┴───────┴───┘
```

## Model Reference

:::{dropdown} RegularChunkGrid
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: RegularChunkGrid

----------

.. autopydantic_model:: RegularChunkShape
```

:::
:::{dropdown} RectilinearChunkGrid
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: RectilinearChunkGrid

----------

.. autopydantic_model:: RectilinearChunkShape
```

:::
Loading
Loading