Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add examples of replacing providers #101

Merged
merged 11 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/user-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ maxdepth: 2
getting-started
parameter-tables
generic-providers
replacing-providers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add this in Recipes, instead of User Guide

```
277 changes: 277 additions & 0 deletions docs/user-guide/replacing-providers.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "86de99b0-3170-45d6-84eb-adbd622af936",
"metadata": {},
"source": [
"# Replacing providers\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exactly what the issue was about. I think insert and __setitem__ can be found in the API docs. The issue was about documenting how to continue from intermediate results. Someone who wants to do that might not look for "replacing providers" (if they would, then they would already know how to do that). I think we should make this clear in the title, and maybe stick to the first example on this page (linking the insert can be useful).

If you want to keep the "replacing provider" example, I'd suggest to make this into another recipe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

"\n",
"## Overview\n",
"\n",
"It is a common need to be able to replace a provider, either with another provider or with a specific value.\n",
"\n",
"Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
" * loading the raw data\n",
" * cleaning the raw data\n",
" * computing a sum of the cleaned data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb0ecea3-5b0d-44da-a363-2a0e861b0235",
"metadata": {},
"outputs": [],
"source": [
"from typing import NewType\n",
"\n",
"Filename = NewType('Filename', str)\n",
"RawData = NewType('RawData', list)\n",
"CleanData = NewType('CleanData', list)\n",
"Result = NewType('Result', list)\n",
"\n",
"filesystem = {'raw.txt': list(map(str, range(10)))}\n",
"\n",
"def load(filename: Filename) -> RawData:\n",
" \"\"\"Load the data from the filename.\"\"\"\n",
" data = filesystem[filename]\n",
" return RawData(data)\n",
"\n",
"def clean(raw_data: RawData) -> CleanData:\n",
" \"\"\"Clean the data, convert from str.\"\"\"\n",
" return CleanData(list(map(float, raw_data)))\n",
"\n",
"def process(clean_data: CleanData) -> Result:\n",
" \"\"\"Compute the sum of the clean data.\"\"\"\n",
" return Result(sum(clean_data))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a4d59ab-4022-4eef-8566-d1b37f9cea7f",
"metadata": {},
"outputs": [],
"source": [
"import sciline\n",
"\n",
"pipeline = sciline.Pipeline(\n",
" [load, clean, process,],\n",
" params={ Filename: 'raw.txt', })\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "8fa7a168-be19-419a-b7b0-dfe6e150134b",
"metadata": {},
"source": [
"## Replacing a provider with a value"
]
},
{
"cell_type": "markdown",
"id": "e68f8022-0369-4abd-a516-ac99432812f3",
"metadata": {},
"source": [
"Select `Result`, the task graph will use the `Filename` input because it needs to read the data from the file system:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "837135eb-c858-484e-91b5-b47917aefe57",
"metadata": {},
"outputs": [],
"source": [
"pipeline.get(Result)"
]
},
{
"cell_type": "markdown",
"id": "3f504103-daa3-427d-9e8c-a4aa332b1f72",
"metadata": {},
"source": [
"But if the cleaned data has already been produced it is unnecessary to \"re-clean\" it, in that case we can proceed directly from the clean data to the compute sum step.\n",
"To do this we replace the `CleanData` provider with the data that was loaded and cleaned:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b54414b-19d0-43aa-b44a-05ae94bd8086",
"metadata": {},
"outputs": [],
"source": [
"data = pipeline.compute(CleanData)\n",
"pipeline[CleanData] = data\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "0596304b-d38a-4b35-b85d-41a7a4dc2605",
"metadata": {},
"source": [
"Then if we select `Result` the task graph will no longer use the `Filename` input and instead it will proceed directly from the `CleanData` as input:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71a0c735-b095-4fb0-bf92-1e424e7ea744",
"metadata": {},
"outputs": [],
"source": [
"pipeline.get(Result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2594ab08-03cd-474a-8690-9b54978d8cf0",
"metadata": {},
"outputs": [],
"source": [
"pipeline.compute(Result)"
]
},
{
"cell_type": "markdown",
"id": "e9b190a1-cca3-4c12-aef0-2169a9a90f55",
"metadata": {},
"source": [
"## Replacing a provider with another provider"
]
},
{
"cell_type": "markdown",
"id": "9ecd237c-c70c-41e1-aebb-5bae714f5031",
"metadata": {},
"source": [
"If the current provider doesn't do what we want it to do we can replace it with another provider."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a12634a5-072a-4587-8fb0-fc531b54bfc7",
"metadata": {},
"outputs": [],
"source": [
"import sciline\n",
"\n",
"pipeline = sciline.Pipeline(\n",
" [load, clean, process,],\n",
" params={ Filename: 'raw.txt', })\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "c9ff03a4-726a-4686-a31c-99fa420f57b2",
"metadata": {},
"source": [
"Let's say the `clean` provider doesn't do all the preprocessing that we want it to do, we also want to remove either the odd or even numbers before processing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d56d330b-955c-4778-96e0-9201212b341f",
"metadata": {},
"outputs": [],
"source": [
"from typing import Literal, Union\n",
"\n",
"Target = NewType('Target', str)\n",
"\n",
"def clean_and_remove_some(raw_data: RawData, target: Target) -> CleanData:\n",
" if target == 'odd':\n",
" return [n for n in map(float, raw_data) if n % 2 == 1]\n",
" if target == 'even':\n",
" return [n for n in map(float, raw_data) if n % 2 == 0]\n",
" raise ValueError"
]
},
{
"cell_type": "markdown",
"id": "fb76644b-d81a-4866-aba8-5e644050439c",
"metadata": {},
"source": [
"To replace the old `CleanData` provider we need to use `Pipeline.insert`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5fcc69e2-9617-4025-99dc-3ce9badfa16a",
"metadata": {},
"outputs": [],
"source": [
"pipeline.insert(clean_and_remove_some)\n",
"pipeline[Target] = 'odd'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "39e740e6-89f8-4693-85e7-8443071da426",
"metadata": {},
"outputs": [],
"source": [
"pipeline"
]
},
{
"cell_type": "markdown",
"id": "e9c9006a-76e2-42d2-b35c-8a8e3abf8323",
"metadata": {},
"source": [
"Now if we select the `Result` we see that the new provider will be used in the computation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "deba51d9-a348-4f6c-9e06-df30c8c8ca38",
"metadata": {},
"outputs": [],
"source": [
"pipeline.get(Result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30bb4ff5-5fc0-4095-b4a2-a2e5e94824e8",
"metadata": {},
"outputs": [],
"source": [
"pipeline.compute(Result)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}