Skip to content

Commit

Permalink
docs: add examples of replacing providers (#101)
Browse files Browse the repository at this point in the history
* docs: add examples of replacing providers

* fix

* update index

* put in recipes and split into two

* merge recipes

* remove from user-guide

* split recipes to separate files

* fix

* fix headings

* Update docs/recipes/continue-from-intermediate-results.ipynb

Co-authored-by: Simon Heybrock <[email protected]>

* fix

---------

Co-authored-by: Simon Heybrock <[email protected]>
  • Loading branch information
jokasimr and SimonHeybrock authored Jan 17, 2024
1 parent 7951e09 commit da1da06
Show file tree
Hide file tree
Showing 5 changed files with 355 additions and 12 deletions.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ hidden:
---
user-guide/index
recipes/recipes
recipes/index
api-reference/index
developer/index
about/index
Expand Down
164 changes: 164 additions & 0 deletions docs/recipes/continue-from-intermediate-results.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1b29e65b-73cb-4fc0-b9ad-1d7384a16578",
"metadata": {},
"source": [
"# Continue from intermediate results\n",
"\n",
"It is a common need to be able to continue the pipeline from some intermediate result computed earlier.\n",
"\n",
"TLDR\n",
"```python\n",
"# Pipeline: Input -> CleanData -> Result\n",
"data = pipeline.compute(CleanData)\n",
"pipeline[CleanData] = data\n",
"result = pipeline.compute(Result)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "f239f707-bc9d-4f6f-997c-fb1c73e68223",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
" * loading the raw data\n",
" * cleaning the raw data\n",
" * computing a sum of the cleaned data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2c46df9-43ad-4422-816a-a402df169587",
"metadata": {},
"outputs": [],
"source": [
"from typing import NewType\n",
"\n",
"Filename = NewType('Filename', str)\n",
"RawData = NewType('RawData', list)\n",
"CleanData = NewType('CleanData', list)\n",
"Result = NewType('Result', list)\n",
"\n",
"filesystem = {'raw.txt': list(map(str, range(10)))}\n",
"\n",
"def load(filename: Filename) -> RawData:\n",
" \"\"\"Load the data from the filename.\"\"\"\n",
" data = filesystem[filename]\n",
" return RawData(data)\n",
"\n",
"def clean(raw_data: RawData) -> CleanData:\n",
" \"\"\"Clean the data, convert from str.\"\"\"\n",
" return CleanData(list(map(float, raw_data)))\n",
"\n",
"def process(clean_data: CleanData) -> Result:\n",
" \"\"\"Compute the sum of the clean data.\"\"\"\n",
" return Result(sum(clean_data))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7ae3a94-3259-4be1-bffc-720da04df9ed",
"metadata": {},
"outputs": [],
"source": [
"import sciline\n",
"\n",
"pipeline = sciline.Pipeline(\n",
" [load, clean, process,],\n",
" params={ Filename: 'raw.txt', })\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "f5c4d999-6a85-4d7b-9b2f-751e261690e7",
"metadata": {},
"source": [
"## Setting intermediate results"
]
},
{
"cell_type": "markdown",
"id": "91f0cfac-1440-4fbb-98f6-c6c9451f3275",
"metadata": {},
"source": [
"Given a pipeline, we may want to compute an intermediate result for inspection:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "affba12d-dcc5-45b1-83c4-bcc61a6bbc92",
"metadata": {},
"outputs": [],
"source": [
"data = pipeline.compute(CleanData)"
]
},
{
"cell_type": "markdown",
"id": "ca41d0e1-19c5-4e08-bd56-db7ec7e437e2",
"metadata": {},
"source": [
"If later on we wish to compute a result further down the pipeline (derived from `CleanData`), this would cause potentially costly re-computation of `CleanData`, since Sciline does not perform any caching:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a54bf2e6-e00a-4dc3-a442-b16dd55c0031",
"metadata": {},
"outputs": [],
"source": [
"result = pipeline.compute(Result) # re-computes CleanData"
]
},
{
"cell_type": "markdown",
"id": "b62ab5df-3010-4f0e-87a9-4dc416680929",
"metadata": {},
"source": [
"To avoid this, we can use `Pipeline.__setitem__` to replace the provider of `CleanData` by the previously computed data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23fe22b8-b59e-4d63-9255-f510bbd8bec7",
"metadata": {},
"outputs": [],
"source": [
"pipeline[CleanData] = data\n",
"result = pipeline.compute(Result)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
11 changes: 11 additions & 0 deletions docs/recipes/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Recipes

```{toctree}
---
maxdepth: 2
---
side-effects-and-file-writing
continue-from-intermediate-results
replacing-providers
```
176 changes: 176 additions & 0 deletions docs/recipes/replacing-providers.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f839e644-4535-4a3d-8066-01a65e6b7f84",
"metadata": {},
"source": [
"# Replacing providers\n",
"\n",
"This example shows how to replace a provider in the pipeline using the `Pipeline.insert` method."
]
},
{
"cell_type": "markdown",
"id": "a7117710-24cb-4c0c-be0c-3980898dc508",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
" * loading the raw data\n",
" * cleaning the raw data\n",
" * computing a sum of the cleaned data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f43e5d7-94c9-4e96-9587-03bd063f1f24",
"metadata": {},
"outputs": [],
"source": [
"from typing import NewType\n",
"import sciline\n",
"\n",
"Filename = NewType('Filename', str)\n",
"RawData = NewType('RawData', list)\n",
"CleanData = NewType('CleanData', list)\n",
"Result = NewType('Result', list)\n",
"\n",
"filesystem = {'raw.txt': list(map(str, range(10)))}\n",
"\n",
"def load(filename: Filename) -> RawData:\n",
" \"\"\"Load the data from the filename.\"\"\"\n",
" data = filesystem[filename]\n",
" return RawData(data)\n",
"\n",
"def clean(raw_data: RawData) -> CleanData:\n",
" \"\"\"Clean the data, convert from str.\"\"\"\n",
" return CleanData(list(map(float, raw_data)))\n",
"\n",
"def process(clean_data: CleanData) -> Result:\n",
" \"\"\"Compute the sum of the clean data.\"\"\"\n",
" return Result(sum(clean_data))\n",
"\n",
"pipeline = sciline.Pipeline(\n",
" [load, clean, process,],\n",
" params={ Filename: 'raw.txt', })\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "589842f1-24f4-4cab-87a1-0018949facaf",
"metadata": {},
"source": [
"## Replacing a provider using `Pipeline.insert`"
]
},
{
"cell_type": "markdown",
"id": "a5454985-9b45-4416-95ca-0ba9d2603e79",
"metadata": {},
"source": [
"Let's say the `clean` provider doesn't do all the preprocessing that we want it to do, we also want to remove either the odd or even numbers before processing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "872863d7-4919-4e7a-855a-e7df5f86d488",
"metadata": {},
"outputs": [],
"source": [
"from typing import Literal, Union, NewType\n",
"\n",
"Target = NewType('Target', str)\n",
"\n",
"def clean_and_remove_some(raw_data: RawData, target: Target) -> CleanData:\n",
" if target == 'odd':\n",
" return [n for n in map(float, raw_data) if n % 2 == 1]\n",
" if target == 'even':\n",
" return [n for n in map(float, raw_data) if n % 2 == 0]\n",
" raise ValueError"
]
},
{
"cell_type": "markdown",
"id": "751f7f91-0b9e-4da9-92b8-4376c9317e93",
"metadata": {},
"source": [
"To replace the old `CleanData` provider we need to use `Pipeline.insert`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7f38468-7194-4735-bc4b-7e4de5866a3a",
"metadata": {},
"outputs": [],
"source": [
"pipeline.insert(clean_and_remove_some)\n",
"pipeline[Target] = 'odd'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56033cd9-6a99-40e3-b2ec-920de65b11bc",
"metadata": {},
"outputs": [],
"source": [
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "85cc2c84-cda2-49ee-83ea-0155e6f54f26",
"metadata": {},
"source": [
"Now if we select the `Result` we see that the new provider will be used in the computation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "41ed69dd-dd2a-41bf-9e1a-0d3f83d55a35",
"metadata": {},
"outputs": [],
"source": [
"pipeline.get(Result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ff92573-3944-4911-b32e-76774bef0c4d",
"metadata": {},
"outputs": [],
"source": [
"pipeline.compute(Result)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit da1da06

Please sign in to comment.