Skip to content

Commit

Permalink
Experiment: How HDFReferenceRecipe would look as a Beam Pipeline.
Browse files Browse the repository at this point in the history
This is a prototype for using Apache Beam for the internal (and external?) data model of Pangeo Forge Recipes. Here, I demo how HDFReferenceRecipe could be structured into modular components via composite Beam transforms.

xref: pangeo-forge#256
  • Loading branch information
alxmrs committed Apr 12, 2022
2 parents 910f740 + e8468df commit d2dd4ed
Show file tree
Hide file tree
Showing 71 changed files with 30,082 additions and 9,345 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/tutorials.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
# can't use default token
# https://github.community/t/create-a-check-run-details-url-is-not-being-set/166002/4?u=bkwhite
# https://github.com/LouisBrunner/checks-action/issues/18#issuecomment-970312052
token: ${{ secrets.ACTIONS_BOT_TOKEN }}
token: ${{ secrets.GITHUB_TOKEN }}
name: Test Notebook ${{ matrix.nb-path }}
status: in_progress
# this seems to be broken
Expand Down Expand Up @@ -105,7 +105,7 @@ jobs:
if: ${{ always() && steps.start-check.outputs.check_id }}
with:
sha: ${{ github.event.client_payload.pull_request.head.sha }}
token: ${{ secrets.ACTIONS_BOT_TOKEN }}
token: ${{ secrets.GITHUB_TOKEN }}
check_id: ${{ steps.start-check.outputs.check_id }}
status: completed
conclusion: ${{ job.status }}
13 changes: 8 additions & 5 deletions ci/py3.8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@ dependencies:
- codecov
- dask
- distributed
- fsspec>=2021.6.0
- gcsfs
- fsspec>=2022.1.0
- gcsfs>=2022.1.0
- h5netcdf
# can't solve environment; moved to pip
# - h5py>=3.3.0 - hdf5
- intake
- intake-xarray
- kerchunk>=0.0.6
- lxml
- netcdf4
- numcodecs
Expand All @@ -27,14 +28,17 @@ dependencies:
- pip
- prefect
- pydap
- pynio
# bring back eventually once pynio conda-forge package does not conflict
# with ujson, which is a depencency of kerchunk's conda-forge feedstock.
# See: https://github.com/conda-forge/pynio-feedstock/issues/114
# - pynio
- pytest
- pytest-cov
- pytest-lazy-fixture
- rasterio
- requests
- rechunker>=0.4.2
- s3fs
- s3fs>=2022.1.0
- scipy
- setuptools
- toolz
Expand All @@ -45,4 +49,3 @@ dependencies:
- pip:
- h5py>=3.3.0
- pytest-timeout
- fsspec-reference-maker>=0.0.4
16 changes: 9 additions & 7 deletions ci/py3.9.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,15 @@ dependencies:
- codecov
- dask
- distributed
- fsspec>=2021.6.0
- gcsfs
- fsspec>=2022.1.0
- gcsfs>=2022.1.0
- graphviz # needed for building tutorial notebooks
- h5netcdf
- h5py>=3.3.0
- hdf5
- intake
- intake-xarray
- kerchunk>=0.0.6
- lxml # Optional dep of pydap
- matplotlib # needed for building tutorial notebooks
- netcdf4
Expand All @@ -28,8 +29,10 @@ dependencies:
- pip
- prefect
- pydap
# bring back eventually once pynio conda-forge package supports py3.9
# - pynio
# bring back eventually once pynio conda-forge package supports py3.9 and does not
# conflict with ujson, which is a depencency of kerchunk's conda-forge feedstock.
# See: https://github.com/conda-forge/pynio-feedstock/issues/114
# - pynio
- pytest
- pytest-cov
- pytest-lazy-fixture
Expand All @@ -38,12 +41,11 @@ dependencies:
- requests
- rechunker>=0.4.2
- scipy
- s3fs
- s3fs>=2022.1.0
- setuptools
- toolz
- xarray>=0.18.0
- zarr>=2.6.0
- pip:
- nbmake # used in tutorial nb worklow
- nbmake>=1.3.0 # used in tutorial nb worklow
- pytest-timeout
- fsspec-reference-maker>=0.0.4
1 change: 1 addition & 0 deletions ci/upstream-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ dependencies:
- "git+https://github.com/fsspec/filesystem_spec.git"
- "git+https://github.com/PrefectHQ/prefect.git"
- "git+https://github.com/pydata/xarray.git"
- "git+https://github.com/fsspec/kerchunk.git"
Binary file added docs/_static/human-maintainer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/pangeo-forge-bot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/recipe-contributor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 0 additions & 17 deletions docs/cloud_automation_user_guide/bakeries.md

This file was deleted.

24 changes: 0 additions & 24 deletions docs/cloud_automation_user_guide/index.md

This file was deleted.

31 changes: 0 additions & 31 deletions docs/cloud_automation_user_guide/recipe_box.md

This file was deleted.

42 changes: 39 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -- Project information -----------------------------------------------------

project = "Pangeo Forge"
copyright = "2020, Pangeo Community"
copyright = "2021, Pangeo Community"
author = "Pangeo Community"

# -- General configuration ---------------------------------------------------
Expand All @@ -10,9 +10,13 @@
"myst_nb",
"sphinx.ext.autodoc",
"sphinx.ext.extlinks",
"sphinx.ext.graphviz",
# "numpydoc",
"sphinx_autodoc_typehints",
"sphinx_copybutton",
"sphinx_togglebutton",
"sphinxext.opengraph",
"sphinx_panels",
]

extlinks = {
Expand All @@ -25,7 +29,11 @@

# we always have to manually run the notebooks because they are slow / expensive
jupyter_execute_notebooks = "auto"
execution_excludepatterns = ["tutorials/xarray_zarr/*", "tutorials/hdf_reference/*"]
execution_excludepatterns = [
"tutorials/xarray_zarr/*",
"tutorials/hdf_reference/*",
"introduction_tutorial/*",
]

# -- Options for HTML output -------------------------------------------------

Expand All @@ -42,4 +50,32 @@
html_logo = "_static/pangeo-forge-logo-blue.png"
html_static_path = ["_static"]

myst_heading_anchors = 2
myst_heading_anchors = 3
myst_enable_extensions = ["substitution"]

github_comment_header = (
"<img "
'style="background: white; border: 1px solid rgba(0,0,0,0.25); border-radius: 50%; width:2em" '
'src="../_static/{username}.png" '
'alt="{username}"/> '
"<span "
'style="font-size: 1.1em; font-weight: 600">'
"{username}"
"</span>"
"<span "
'style="font-size: 1.1em; font-weight: 400; color: rgba(0,0,0,0.5)">'
" commented"
"</span>"
)
myst_substitutions = {
"pangeo_forge_bot_header": github_comment_header.format(username="pangeo-forge-bot"),
"human_maintainer_header": github_comment_header.format(username="human-maintainer"),
"recipe_contributor_header": github_comment_header.format(username="recipe-contributor"),
}

autodoc_mock_imports = ["apache_beam"]

# should be set automatically on RTD based on html_baseurl
# ogp_site_url = "https://pangeo-forge.readthedocs.io/"
ogp_image = "_static/pangeo-forge-logo-blue.png"
ogp_use_first_image = True
Binary file added docs/images/Format_function.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/OISST_URL_structure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/OISST_structure_conversion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SLA_Format_function.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SLA_URL_structure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SLA_structure_conversion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SSS_Format_function.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SSS_URL_structure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SSS_structure_conversion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/XarrayZarrRecipe Syntax Recap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 26 additions & 38 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,50 @@
# Pangeo Forge Documentation

Resources for understanding and using Pangeo Forge
Pangeo Forge is an open source framework for Extraction, Transformation, and Loading (ETL) of scientific data.

## First Steps

New to Pangeo Forge? Start here!
New to Pangeo Forge? You are in the right spot!

- {doc}`what_is_pangeo_forge` - Read more about Pangeo Forge and how it works!
- {doc}`intro_tutorial` - Ready to code? Walk through creating and deploying your first Recipe.
- {doc}`what_is_pangeo_forge` - Read more about Pangeo Forge and how it works.
- {doc}`introduction_tutorial/index` - Ready to code? Walk through creating, running, and staging your first Recipe.

## How the documentation is organized

There are a number of places to access resources when working with components of Pangeo Forge.
Here is an overview of what you will find:
There are a number of resources available when working with Pangeo Forge:

- The {doc}`intro_tutorial` is the place to start with Pangeo Forge.
It walks the user through the process of getting set up with their first Recipe.
- The **User Guides** explain core Pangeo Forge concepts in detail. They provide
- **Introduction Tutorial**: {doc}`introduction_tutorial/index` - Walks you through creating, running, and staging your first Recipe.
- **User Guides** explain core Pangeo Forge concepts in detail. They provide
background information to aid in gaining a depth of understanding:
- {doc}`recipe_user_guide/index` - For learning about how to create Recipes.
- {doc}`development/index` - For developers seeking to contribute to Pangeo Forge core functionality.
- {doc}`cloud_automation_user_guide/index` - For digging deeper into the automation systems that
power Pangeo Forge in the cloud.
- **Reference Pages** are the complete technical documentation of all Pangeo Forge features.
They are useful when you want to review a particular functionality in depth,
but assume you already have a working knowledge of the code base
- {doc}`pangeo_forge_recipes/recipe_user_guide/index` - For learning about how to create Recipes.
- {doc}`pangeo_forge_cloud/recipe_contribution` - For learning how to contribute recipes to Pangeo Forge Cloud.
- {doc}`pangeo_forge_recipes/development/development_guide` - For developers seeking to contribute to Pangeo Forge core functionality.
- **Advanced Examples** walk through examples of using Pangeo Forge Recipes:
- {doc}`pangeo_forge_recipes/tutorials/index` - Are in-depth demonstrations of using Pangeo Forge Recipes with real world datasets. They are a good next step after the Introduction Tutorial.
- **References** are the complete technical documentation of Pangeo Forge features. They are useful when you want to review a particular functionality in depth,
but assume you already have a working knowledge of the framework:
- {doc}`pangeo_forge_recipes/api_reference`
- {doc}`pangeo_forge_cloud/pr_checks_reference`

## Repository Reference

There are many respositories that make up Pangeo Forge. Here are links to the different documentation pages:
## Connecting with the Community

- pangeo-forge-recipes
- pangeo-forge-azure-bakery
- pangeo-forge-aws-bakery
Pangeo Forge is a community run effort with a variety of roles:

## Connecting the Community
- **Recipe contributors** — contributors who write recipes to define the data conversions. This can be anyone with a desire to create analysis ready cloud-optimized (ARCO) data. To get involved, see {doc}`pangeo_forge_cloud/recipe_contribution`.
- **Bakery operators** — individuals or instituations who deploy bakeries on cloud infrastructure to process and host the transformed data. This is typically an organization with a grant to fund the infrastructure. For more information, see the Bakeries section of {doc}`pangeo_forge_cloud/core_concepts`.
- **Pangeo Forge developers** - scientists and software developers who maintain and enhance the open-source code base which makes Pangeo Forge run. See {doc}`pangeo_forge_recipes/development/index`.

Pangeo Forge is a community run effort. There are different roles that people play to support the effort:

- Recipe contributors — contributors who write recipes to define the data conversions. This can be anyone with a desire to create analysis ready cloud-optimized (ARCO) data
- Bakery operators — individuals or instituations who deploy bakeries on cloud infrastructure to process and host the transformed data. This is typically an organization with a grant to fund the infrastructure
- Pangeo forge developers - scientists and software developers who contribute to maintaining and enhancing the open-source code base which makes Pangeo Forge run.

If you are new to Pangeo Forge and looking to get involved, we suggest getting started with recipe contribution. You can do in two ways:

- Open a ticket with a dataset request (no code required!) - Get started here (link)
- Write a recipe for a dataset you'd like to see transformed - See recipe creation docs
If you are new to Pangeo Forge and looking to get involved, we suggest starting with {doc}`pangeo_forge_cloud/recipe_contribution`.


## Site Contents

```{toctree}
:maxdepth: 2
:maxdepth: 3
what_is_pangeo_forge
intro_tutorial
recipe_user_guide/index
cloud_automation_user_guide/index
tutorials/index
development/index
introduction_tutorial/index
pangeo_forge_recipes/index
pangeo_forge_cloud/index
```
Loading

0 comments on commit d2dd4ed

Please sign in to comment.