docs: add examples of replacing providers (#101)

* docs: add examples of replacing providers * fix * update index * put in recipes and split into two * merge recipes * remove from user-guide * split recipes to separate files * fix * fix headings * Update docs/recipes/continue-from-intermediate-results.ipynb Co-authored-by: Simon Heybrock <[email protected]> * fix --------- Co-authored-by: Simon Heybrock <[email protected]>
scipp · Jan 17, 2024 · da1da06 · da1da06
1 parent 7951e09
commit da1da06
Show file tree

Hide file tree

Showing 5 changed files with 355 additions and 12 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -108,7 +108,7 @@ hidden:
 ---
 
 user-guide/index
-recipes/recipes
+recipes/index
 api-reference/index
 developer/index
 about/index

diff --git a/docs/recipes/continue-from-intermediate-results.ipynb b/docs/recipes/continue-from-intermediate-results.ipynb
@@ -0,0 +1,164 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1b29e65b-73cb-4fc0-b9ad-1d7384a16578",
+   "metadata": {},
+   "source": [
+    "# Continue from intermediate results\n",
+    "\n",
+    "It is a common need to be able to continue the pipeline from some intermediate result computed earlier.\n",
+    "\n",
+    "TLDR\n",
+    "```python\n",
+    "# Pipeline: Input -> CleanData -> Result\n",
+    "data = pipeline.compute(CleanData)\n",
+    "pipeline[CleanData] = data\n",
+    "result = pipeline.compute(Result)\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f239f707-bc9d-4f6f-997c-fb1c73e68223",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
+    "  * loading the raw data\n",
+    "  * cleaning the raw data\n",
+    "  * computing a sum of the cleaned data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2c46df9-43ad-4422-816a-a402df169587",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import NewType\n",
+    "\n",
+    "Filename = NewType('Filename', str)\n",
+    "RawData = NewType('RawData', list)\n",
+    "CleanData = NewType('CleanData', list)\n",
+    "Result = NewType('Result', list)\n",
+    "\n",
+    "filesystem = {'raw.txt': list(map(str, range(10)))}\n",
+    "\n",
+    "def load(filename: Filename) -> RawData:\n",
+    "    \"\"\"Load the data from the filename.\"\"\"\n",
+    "    data = filesystem[filename]\n",
+    "    return RawData(data)\n",
+    "\n",
+    "def clean(raw_data: RawData) -> CleanData:\n",
+    "    \"\"\"Clean the data, convert from str.\"\"\"\n",
+    "    return CleanData(list(map(float, raw_data)))\n",
+    "\n",
+    "def process(clean_data: CleanData) -> Result:\n",
+    "    \"\"\"Compute the sum of the clean data.\"\"\"\n",
+    "    return Result(sum(clean_data))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c7ae3a94-3259-4be1-bffc-720da04df9ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sciline\n",
+    "\n",
+    "pipeline = sciline.Pipeline(\n",
+    "    [load, clean, process,],\n",
+    "    params={ Filename: 'raw.txt', })\n",
+    "pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5c4d999-6a85-4d7b-9b2f-751e261690e7",
+   "metadata": {},
+   "source": [
+    "## Setting intermediate results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91f0cfac-1440-4fbb-98f6-c6c9451f3275",
+   "metadata": {},
+   "source": [
+    "Given a pipeline, we may want to compute an intermediate result for inspection:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "affba12d-dcc5-45b1-83c4-bcc61a6bbc92",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = pipeline.compute(CleanData)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca41d0e1-19c5-4e08-bd56-db7ec7e437e2",
+   "metadata": {},
+   "source": [
+    "If later on we wish to compute a result further down the pipeline (derived from `CleanData`), this would cause potentially costly re-computation of `CleanData`, since Sciline does not perform any caching:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a54bf2e6-e00a-4dc3-a442-b16dd55c0031",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = pipeline.compute(Result)  # re-computes CleanData"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b62ab5df-3010-4f0e-87a9-4dc416680929",
+   "metadata": {},
+   "source": [
+    "To avoid this, we can use `Pipeline.__setitem__` to replace the provider of `CleanData` by the previously computed data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "23fe22b8-b59e-4d63-9255-f510bbd8bec7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline[CleanData] = data\n",
+    "result = pipeline.compute(Result)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/recipes/index.md b/docs/recipes/index.md
@@ -0,0 +1,11 @@
+# Recipes
+
+```{toctree}
+---
+maxdepth: 2
+---
+
+side-effects-and-file-writing
+continue-from-intermediate-results
+replacing-providers
+```
diff --git a/docs/recipes/replacing-providers.ipynb b/docs/recipes/replacing-providers.ipynb
@@ -0,0 +1,176 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "f839e644-4535-4a3d-8066-01a65e6b7f84",
+   "metadata": {},
+   "source": [
+    "# Replacing providers\n",
+    "\n",
+    "This example shows how to replace a provider in the pipeline using the `Pipeline.insert` method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7117710-24cb-4c0c-be0c-3980898dc508",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "Lets look at a situation where we have some \"raw\" data files and the workflow consists of three steps\n",
+    "  * loading the raw data\n",
+    "  * cleaning the raw data\n",
+    "  * computing a sum of the cleaned data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5f43e5d7-94c9-4e96-9587-03bd063f1f24",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import NewType\n",
+    "import sciline\n",
+    "\n",
+    "Filename = NewType('Filename', str)\n",
+    "RawData = NewType('RawData', list)\n",
+    "CleanData = NewType('CleanData', list)\n",
+    "Result = NewType('Result', list)\n",
+    "\n",
+    "filesystem = {'raw.txt': list(map(str, range(10)))}\n",
+    "\n",
+    "def load(filename: Filename) -> RawData:\n",
+    "    \"\"\"Load the data from the filename.\"\"\"\n",
+    "    data = filesystem[filename]\n",
+    "    return RawData(data)\n",
+    "\n",
+    "def clean(raw_data: RawData) -> CleanData:\n",
+    "    \"\"\"Clean the data, convert from str.\"\"\"\n",
+    "    return CleanData(list(map(float, raw_data)))\n",
+    "\n",
+    "def process(clean_data: CleanData) -> Result:\n",
+    "    \"\"\"Compute the sum of the clean data.\"\"\"\n",
+    "    return Result(sum(clean_data))\n",
+    "\n",
+    "pipeline = sciline.Pipeline(\n",
+    "    [load, clean, process,],\n",
+    "    params={ Filename: 'raw.txt', })\n",
+    "pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "589842f1-24f4-4cab-87a1-0018949facaf",
+   "metadata": {},
+   "source": [
+    "## Replacing a provider using `Pipeline.insert`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5454985-9b45-4416-95ca-0ba9d2603e79",
+   "metadata": {},
+   "source": [
+    "Let's say the `clean` provider doesn't do all the preprocessing that we want it to do, we also want to remove either the odd or even numbers before processing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "872863d7-4919-4e7a-855a-e7df5f86d488",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import Literal, Union, NewType\n",
+    "\n",
+    "Target = NewType('Target', str)\n",
+    "\n",
+    "def clean_and_remove_some(raw_data: RawData, target: Target) -> CleanData:\n",
+    "    if target == 'odd':\n",
+    "        return [n for n in map(float, raw_data) if n % 2 == 1]\n",
+    "    if target == 'even':\n",
+    "        return [n for n in map(float, raw_data) if n % 2 == 0]\n",
+    "    raise ValueError"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "751f7f91-0b9e-4da9-92b8-4376c9317e93",
+   "metadata": {},
+   "source": [
+    "To replace the old `CleanData` provider we need to use `Pipeline.insert`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7f38468-7194-4735-bc4b-7e4de5866a3a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline.insert(clean_and_remove_some)\n",
+    "pipeline[Target] = 'odd'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "56033cd9-6a99-40e3-b2ec-920de65b11bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85cc2c84-cda2-49ee-83ea-0155e6f54f26",
+   "metadata": {},
+   "source": [
+    "Now if we select the `Result` we see that the new provider will be used in the computation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41ed69dd-dd2a-41bf-9e1a-0d3f83d55a35",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline.get(Result)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3ff92573-3944-4911-b32e-76774bef0c4d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipeline.compute(Result)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}