Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add examples of replacing providers #101

Merged
merged 11 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ hidden:
---

user-guide/index
recipes/recipes
recipes/index
api-reference/index
developer/index
about/index
Expand Down
164 changes: 164 additions & 0 deletions docs/recipes/continue-from-intermediate-results.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1b29e65b-73cb-4fc0-b9ad-1d7384a16578",
"metadata": {},
"source": [
"# Continue from intermediate results\n",
"\n",
"It is a common need to be able to continue the pipeline from some intermediate result computed earlier.\n",
"\n",
"TLDR\n",
"```python\n",
"# Pipeline: Input -> CleanData -> Result\n",
"data = pipeline.compute(CleanData)\n",
"pipeline[CleanData] = data\n",
"result = pipeline.compute(Result)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "f239f707-bc9d-4f6f-997c-fb1c73e68223",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
" * loading the raw data\n",
" * cleaning the raw data\n",
" * computing a sum of the cleaned data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2c46df9-43ad-4422-816a-a402df169587",
"metadata": {},
"outputs": [],
"source": [
"from typing import NewType\n",
"\n",
"Filename = NewType('Filename', str)\n",
"RawData = NewType('RawData', list)\n",
"CleanData = NewType('CleanData', list)\n",
"Result = NewType('Result', list)\n",
"\n",
"filesystem = {'raw.txt': list(map(str, range(10)))}\n",
"\n",
"def load(filename: Filename) -> RawData:\n",
" \"\"\"Load the data from the filename.\"\"\"\n",
" data = filesystem[filename]\n",
" return RawData(data)\n",
"\n",
"def clean(raw_data: RawData) -> CleanData:\n",
" \"\"\"Clean the data, convert from str.\"\"\"\n",
" return CleanData(list(map(float, raw_data)))\n",
"\n",
"def process(clean_data: CleanData) -> Result:\n",
" \"\"\"Compute the sum of the clean data.\"\"\"\n",
" return Result(sum(clean_data))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7ae3a94-3259-4be1-bffc-720da04df9ed",
"metadata": {},
"outputs": [],
"source": [
"import sciline\n",
"\n",
"pipeline = sciline.Pipeline(\n",
" [load, clean, process,],\n",
" params={ Filename: 'raw.txt', })\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "f5c4d999-6a85-4d7b-9b2f-751e261690e7",
"metadata": {},
"source": [
"## Setting intermediate results"
]
},
{
"cell_type": "markdown",
"id": "91f0cfac-1440-4fbb-98f6-c6c9451f3275",
"metadata": {},
"source": [
"Given a pipeline, we may want to compute an intermediate result for inspection:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "affba12d-dcc5-45b1-83c4-bcc61a6bbc92",
"metadata": {},
"outputs": [],
"source": [
"data = pipeline.compute(CleanData)"
]
},
{
"cell_type": "markdown",
"id": "ca41d0e1-19c5-4e08-bd56-db7ec7e437e2",
"metadata": {},
"source": [
"If later on we wish to compute a result further down the pipeline (derived from `CleanData`), this would cause potentially costly re-computation of `CleanData`, since Sciline does not perform any caching:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a54bf2e6-e00a-4dc3-a442-b16dd55c0031",
"metadata": {},
"outputs": [],
"source": [
"result = pipeline.compute(Result) # re-computes CleanData"
]
},
{
"cell_type": "markdown",
"id": "b62ab5df-3010-4f0e-87a9-4dc416680929",
"metadata": {},
"source": [
"To avoid this, we can use `Pipeline.__setitem__` to replace the provider of `CleanData` by the previously computed data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23fe22b8-b59e-4d63-9255-f510bbd8bec7",
"metadata": {},
"outputs": [],
"source": [
"pipeline[CleanData] = data\n",
"result = pipeline.compute(Result)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
11 changes: 11 additions & 0 deletions docs/recipes/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Recipes

```{toctree}
---
maxdepth: 2
---

side-effects-and-file-writing
continue-from-intermediate-results
replacing-providers
```
176 changes: 176 additions & 0 deletions docs/recipes/replacing-providers.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f839e644-4535-4a3d-8066-01a65e6b7f84",
"metadata": {},
"source": [
"# Replacing providers\n",
"\n",
"This example shows how to replace a provider in the pipeline using the `Pipeline.insert` method."
]
},
{
"cell_type": "markdown",
"id": "a7117710-24cb-4c0c-be0c-3980898dc508",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
" * loading the raw data\n",
" * cleaning the raw data\n",
" * computing a sum of the cleaned data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f43e5d7-94c9-4e96-9587-03bd063f1f24",
"metadata": {},
"outputs": [],
"source": [
"from typing import NewType\n",
"import sciline\n",
"\n",
"Filename = NewType('Filename', str)\n",
"RawData = NewType('RawData', list)\n",
"CleanData = NewType('CleanData', list)\n",
"Result = NewType('Result', list)\n",
"\n",
"filesystem = {'raw.txt': list(map(str, range(10)))}\n",
"\n",
"def load(filename: Filename) -> RawData:\n",
" \"\"\"Load the data from the filename.\"\"\"\n",
" data = filesystem[filename]\n",
" return RawData(data)\n",
"\n",
"def clean(raw_data: RawData) -> CleanData:\n",
" \"\"\"Clean the data, convert from str.\"\"\"\n",
" return CleanData(list(map(float, raw_data)))\n",
"\n",
"def process(clean_data: CleanData) -> Result:\n",
" \"\"\"Compute the sum of the clean data.\"\"\"\n",
" return Result(sum(clean_data))\n",
"\n",
"pipeline = sciline.Pipeline(\n",
" [load, clean, process,],\n",
" params={ Filename: 'raw.txt', })\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "589842f1-24f4-4cab-87a1-0018949facaf",
"metadata": {},
"source": [
"## Replacing a provider using `Pipeline.insert`"
]
},
{
"cell_type": "markdown",
"id": "a5454985-9b45-4416-95ca-0ba9d2603e79",
"metadata": {},
"source": [
"Let's say the `clean` provider doesn't do all the preprocessing that we want it to do, we also want to remove either the odd or even numbers before processing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "872863d7-4919-4e7a-855a-e7df5f86d488",
"metadata": {},
"outputs": [],
"source": [
"from typing import Literal, Union, NewType\n",
"\n",
"Target = NewType('Target', str)\n",
"\n",
"def clean_and_remove_some(raw_data: RawData, target: Target) -> CleanData:\n",
" if target == 'odd':\n",
" return [n for n in map(float, raw_data) if n % 2 == 1]\n",
" if target == 'even':\n",
" return [n for n in map(float, raw_data) if n % 2 == 0]\n",
" raise ValueError"
]
},
{
"cell_type": "markdown",
"id": "751f7f91-0b9e-4da9-92b8-4376c9317e93",
"metadata": {},
"source": [
"To replace the old `CleanData` provider we need to use `Pipeline.insert`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7f38468-7194-4735-bc4b-7e4de5866a3a",
"metadata": {},
"outputs": [],
"source": [
"pipeline.insert(clean_and_remove_some)\n",
"pipeline[Target] = 'odd'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56033cd9-6a99-40e3-b2ec-920de65b11bc",
"metadata": {},
"outputs": [],
"source": [
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "85cc2c84-cda2-49ee-83ea-0155e6f54f26",
"metadata": {},
"source": [
"Now if we select the `Result` we see that the new provider will be used in the computation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "41ed69dd-dd2a-41bf-9e1a-0d3f83d55a35",
"metadata": {},
"outputs": [],
"source": [
"pipeline.get(Result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ff92573-3944-4911-b32e-76774bef0c4d",
"metadata": {},
"outputs": [],
"source": [
"pipeline.compute(Result)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading