-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add examples of replacing providers (#101)
* docs: add examples of replacing providers * fix * update index * put in recipes and split into two * merge recipes * remove from user-guide * split recipes to separate files * fix * fix headings * Update docs/recipes/continue-from-intermediate-results.ipynb Co-authored-by: Simon Heybrock <[email protected]> * fix --------- Co-authored-by: Simon Heybrock <[email protected]>
- Loading branch information
1 parent
7951e09
commit da1da06
Showing
5 changed files
with
355 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1b29e65b-73cb-4fc0-b9ad-1d7384a16578", | ||
"metadata": {}, | ||
"source": [ | ||
"# Continue from intermediate results\n", | ||
"\n", | ||
"It is a common need to be able to continue the pipeline from some intermediate result computed earlier.\n", | ||
"\n", | ||
"TLDR\n", | ||
"```python\n", | ||
"# Pipeline: Input -> CleanData -> Result\n", | ||
"data = pipeline.compute(CleanData)\n", | ||
"pipeline[CleanData] = data\n", | ||
"result = pipeline.compute(Result)\n", | ||
"```\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f239f707-bc9d-4f6f-997c-fb1c73e68223", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setup\n", | ||
"\n", | ||
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n", | ||
" * loading the raw data\n", | ||
" * cleaning the raw data\n", | ||
" * computing a sum of the cleaned data." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d2c46df9-43ad-4422-816a-a402df169587", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from typing import NewType\n", | ||
"\n", | ||
"Filename = NewType('Filename', str)\n", | ||
"RawData = NewType('RawData', list)\n", | ||
"CleanData = NewType('CleanData', list)\n", | ||
"Result = NewType('Result', list)\n", | ||
"\n", | ||
"filesystem = {'raw.txt': list(map(str, range(10)))}\n", | ||
"\n", | ||
"def load(filename: Filename) -> RawData:\n", | ||
" \"\"\"Load the data from the filename.\"\"\"\n", | ||
" data = filesystem[filename]\n", | ||
" return RawData(data)\n", | ||
"\n", | ||
"def clean(raw_data: RawData) -> CleanData:\n", | ||
" \"\"\"Clean the data, convert from str.\"\"\"\n", | ||
" return CleanData(list(map(float, raw_data)))\n", | ||
"\n", | ||
"def process(clean_data: CleanData) -> Result:\n", | ||
" \"\"\"Compute the sum of the clean data.\"\"\"\n", | ||
" return Result(sum(clean_data))\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c7ae3a94-3259-4be1-bffc-720da04df9ed", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import sciline\n", | ||
"\n", | ||
"pipeline = sciline.Pipeline(\n", | ||
" [load, clean, process,],\n", | ||
" params={ Filename: 'raw.txt', })\n", | ||
"pipeline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f5c4d999-6a85-4d7b-9b2f-751e261690e7", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setting intermediate results" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "91f0cfac-1440-4fbb-98f6-c6c9451f3275", | ||
"metadata": {}, | ||
"source": [ | ||
"Given a pipeline, we may want to compute an intermediate result for inspection:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "affba12d-dcc5-45b1-83c4-bcc61a6bbc92", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"data = pipeline.compute(CleanData)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ca41d0e1-19c5-4e08-bd56-db7ec7e437e2", | ||
"metadata": {}, | ||
"source": [ | ||
"If later on we wish to compute a result further down the pipeline (derived from `CleanData`), this would cause potentially costly re-computation of `CleanData`, since Sciline does not perform any caching:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "a54bf2e6-e00a-4dc3-a442-b16dd55c0031", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"result = pipeline.compute(Result) # re-computes CleanData" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b62ab5df-3010-4f0e-87a9-4dc416680929", | ||
"metadata": {}, | ||
"source": [ | ||
"To avoid this, we can use `Pipeline.__setitem__` to replace the provider of `CleanData` by the previously computed data:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "23fe22b8-b59e-4d63-9255-f510bbd8bec7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline[CleanData] = data\n", | ||
"result = pipeline.compute(Result)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Recipes | ||
|
||
```{toctree} | ||
--- | ||
maxdepth: 2 | ||
--- | ||
side-effects-and-file-writing | ||
continue-from-intermediate-results | ||
replacing-providers | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f839e644-4535-4a3d-8066-01a65e6b7f84", | ||
"metadata": {}, | ||
"source": [ | ||
"# Replacing providers\n", | ||
"\n", | ||
"This example shows how to replace a provider in the pipeline using the `Pipeline.insert` method." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a7117710-24cb-4c0c-be0c-3980898dc508", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setup\n", | ||
"\n", | ||
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n", | ||
" * loading the raw data\n", | ||
" * cleaning the raw data\n", | ||
" * computing a sum of the cleaned data." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5f43e5d7-94c9-4e96-9587-03bd063f1f24", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from typing import NewType\n", | ||
"import sciline\n", | ||
"\n", | ||
"Filename = NewType('Filename', str)\n", | ||
"RawData = NewType('RawData', list)\n", | ||
"CleanData = NewType('CleanData', list)\n", | ||
"Result = NewType('Result', list)\n", | ||
"\n", | ||
"filesystem = {'raw.txt': list(map(str, range(10)))}\n", | ||
"\n", | ||
"def load(filename: Filename) -> RawData:\n", | ||
" \"\"\"Load the data from the filename.\"\"\"\n", | ||
" data = filesystem[filename]\n", | ||
" return RawData(data)\n", | ||
"\n", | ||
"def clean(raw_data: RawData) -> CleanData:\n", | ||
" \"\"\"Clean the data, convert from str.\"\"\"\n", | ||
" return CleanData(list(map(float, raw_data)))\n", | ||
"\n", | ||
"def process(clean_data: CleanData) -> Result:\n", | ||
" \"\"\"Compute the sum of the clean data.\"\"\"\n", | ||
" return Result(sum(clean_data))\n", | ||
"\n", | ||
"pipeline = sciline.Pipeline(\n", | ||
" [load, clean, process,],\n", | ||
" params={ Filename: 'raw.txt', })\n", | ||
"pipeline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "589842f1-24f4-4cab-87a1-0018949facaf", | ||
"metadata": {}, | ||
"source": [ | ||
"## Replacing a provider using `Pipeline.insert`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a5454985-9b45-4416-95ca-0ba9d2603e79", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's say the `clean` provider doesn't do all the preprocessing that we want it to do, we also want to remove either the odd or even numbers before processing:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "872863d7-4919-4e7a-855a-e7df5f86d488", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from typing import Literal, Union, NewType\n", | ||
"\n", | ||
"Target = NewType('Target', str)\n", | ||
"\n", | ||
"def clean_and_remove_some(raw_data: RawData, target: Target) -> CleanData:\n", | ||
" if target == 'odd':\n", | ||
" return [n for n in map(float, raw_data) if n % 2 == 1]\n", | ||
" if target == 'even':\n", | ||
" return [n for n in map(float, raw_data) if n % 2 == 0]\n", | ||
" raise ValueError" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "751f7f91-0b9e-4da9-92b8-4376c9317e93", | ||
"metadata": {}, | ||
"source": [ | ||
"To replace the old `CleanData` provider we need to use `Pipeline.insert`:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f7f38468-7194-4735-bc4b-7e4de5866a3a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline.insert(clean_and_remove_some)\n", | ||
"pipeline[Target] = 'odd'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "56033cd9-6a99-40e3-b2ec-920de65b11bc", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "85cc2c84-cda2-49ee-83ea-0155e6f54f26", | ||
"metadata": {}, | ||
"source": [ | ||
"Now if we select the `Result` we see that the new provider will be used in the computation:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "41ed69dd-dd2a-41bf-9e1a-0d3f83d55a35", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline.get(Result)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3ff92573-3944-4911-b32e-76774bef0c4d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline.compute(Result)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.