Feature/getting started (#105)

experimental-design · May 5, 2023 · 094eae1 · 094eae1
1 parent 47981e5
commit 094eae1
Show file tree

Hide file tree

Showing 25 changed files with 14,121 additions and 485 deletions.
diff --git a/.github/workflows/docs-build.yaml b/.github/workflows/docs-build.yaml
@@ -25,4 +25,6 @@ jobs:
       - name: Install dependencies
         run: pip install .[optimization,cheminfo,docs]
       - name: Build docs
-        run: mkdocs build
+        run: |
+          ls -l docs/*
+          mkdocs build
diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml
@@ -25,7 +25,9 @@ jobs:
       - name: Install dependencies
         run: |
           pip install .[optimization,cheminfo,docs]
-      - run: mkdocs build
+
+      - name: Build docs
+        run: mkdocs build
 
       - name: Deploy
         uses: peaceiris/actions-gh-pages@v3

diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -46,4 +46,21 @@ jobs:
         run: pip install cyipopt
       - name: Run tests
         run: pytest -ra --cov=bofire --cov-report term-missing tests
+
+
+  testing_tutorials:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+        with:
+          python-version: 3.9
+      - name: Install Bofire
+        run: pip install ".[optimization,tests,cheminfo]"
+      - name: Install ipopt
+        run: sudo apt install build-essential pkg-config coinor-libipopt1v5 coinor-libipopt-dev
+      - name: Install cyipopt
+        run: pip install cyipopt
+      - name: Run notebooks
+        run: python scripts/run_tutorials.py -p "$(pwd)"
 
diff --git a/docs/data_models_functionals.md b/docs/data_models_functionals.md
@@ -1,19 +1,19 @@
 # Data Models vs. Functional Components
 
-Data models in bofire hold static data of an optimization problem. These are input and output features as well as constraints making up the domain. They further include possible optimization objectives, acquisition functions, and kernels.
+Data models in BoFire hold static data of an optimization problem. These are input and output features as well as constraints making up the domain. They further include possible optimization objectives, acquisition functions, and kernels.
 
 All data models in ```bofire.data_models```, are specified as pydantic models and inherit from ```bofire.data_models.base.BaseModel```. These data models can be (de)serialized via ```.dict()``` and ```.json()``` (provided by pydantic). A json schema of each data model can be obtained using ```.schema()```.
 
 For surrogates and strategies, all functional parts are located in ```bofire.surrogates``` and ```bofire.strategies```. These functionalities include the ```ask``` and ```tell``` as well as ```fit``` and ```predict``` methods. All class attributes (used by these method) are also removed from the data models. Each functional entity is initialized using the corresponding data model. As an example, consider the following data model of a ```RandomStrategy```:
 
-```
+```python
 import bofire.data_models.domain.api as dm_domain
 import bofire.data_models.features.api as dm_features
 import bofire.data_models.strategies.api as dm_strategies
 
-in1 = dm_features.ContinuousInput(key="in1", lower_bound=0.0, upper_bound=1.0)
-in2 = dm_features.ContinuousInput(key="in2", lower_bound=0.0, upper_bound=2.0)
-in3 = dm_features.ContinuousInput(key="in3", lower_bound=0.0, upper_bound=3.0)
+in1 = dm_features.ContinuousInput(key="in1", bounds=(0.0,1.0))
+in2 = dm_features.ContinuousInput(key="in2", bounds=(0.0,2.0))
+in3 = dm_features.ContinuousInput(key="in3", bounds=(0.0,3.0))
 
 out1 = dm_features.ContinuousOutput(key="out1")
 
@@ -32,7 +32,7 @@ data_model = dm_strategies.RandomStrategy(domain=domain)
 
 Such a data model can be (de)serialized as follows:
 
-```
+```python
 import json
 from pydantic import parse_obj_as
 from bofire.data_models.strategies.api import AnyStrategy
@@ -46,13 +46,13 @@ assert data_model_ == data_model
 
 Using this data model of a strategy, we can create an instance of a (functional) strategy:
 
-```
+```python
 import bofire.strategies.api as strategies
 strategy = strategies.RandomStrategy(data_model=data_model)
 ```
 
 As each strategy data model should be mapped to a specific (functional) strategy, we provide such a mapping:
 
-```
+```python
 strategy = strategies.map(data_model)
-```
+```
diff --git a/docs/getting_started.ipynb b/docs/getting_started.ipynb
@@ -0,0 +1 @@
+../tutorials/getting_started.ipynb
diff --git a/docs/index.md b/docs/index.md
@@ -1,11 +1,14 @@
 # Introduction
 
 BoFire is a framework to define and solve black-box optimization problems. 
-These problems can arise in a number of closely related fields including experimental design, multiobjective optimization and active learning.
+These problems can arise in a number of closely related fields including experimental design, multi-objective optimization and active learning.
 
 BoFire problem specifications are json serializable for use in RESTful APIs and are to a large extent agnostic to the specific methods and frameworks in which the problems are solved.
 
+You can find code-examples in the Getting Started section of this document, as well as full worked-out examples of code-usage in the /tutorials section of this repository!
+
 ## Experimental design
+
 In the context of experimental design BoFire allows to define a design space
 
 $$
@@ -25,8 +28,9 @@ and a set of equations define additional experimental constraints, e.g.
 * non-linear inequality: $\sum x_i^2 \leq 1$
 * n-choose-k: only $k$ out of $n$ parameters can take non-zero values.
 
-## Multiobjective optimization
-In the context of multiobjective optimization BoFire allows to define a vector-valued optimization problem
+## Multi-objective optimization
+
+In the context of multi-objective optimization BoFire allows to define a vector-valued optimization problem
 
 $$
 \min_{x \in \mathbb{X}} s(y(x))
@@ -38,18 +42,21 @@ where
 * $y = \{y_1, \ldots y_M\}$ are known functions describing your experimental outputs and
 * $s = \{s_1, \ldots s_M\}$ are the objectives to be minimized, e.g. $s_1$ is the identity function if $y_1$ is to be minimized.
 
-Since the objectives are in general conflicting, there is no point $x$ that simulataneously optimizes all objectives.
+Since the objectives are in general conflicting, there is no point $x$ that simultaneously optimizes all objectives.
 Instead the goal is to find the Pareto front of all optimal compromises.
+
 A decision maker can then explore these compromises to get a deep understanding of the problem and make the best informed decision.
 
 ## Bayesian optimization
+
 In the context of Bayesian optimization we want to simultaneously learn the unknown function $y(x)$ (exploration), while focusing the experimental effort on promising regions (exploitation).
-This is done by using the experimental data to fit a probabilistic model $p(y|x, {data})$ that estimates the distribution of posible outcomes for $y$.
+This is done by using the experimental data to fit a probabilistic model $p(y|x, {data})$ that estimates the distribution of possible outcomes for $y$.
 An acquisition function $a$ then formulates the desired trade-off between exploration and exploitation
 
 $$
 \min_{x \in \mathbb{X}} a(s(p_y(x)))
 $$
 
-and the minimizer $x_\mathrm{opt}$ of this acquisition function. determines the next experiment $y(x)$ to run.
-When are multiple competing objectives, the task is again to find a suitable approximation of the Pareto front.
+and the minimizer $x_\mathrm{opt}$ of this acquisition function determines the next experiment $y(x)$ to run.
+
+When there are multiple competing objectives, the task is again to find a suitable approximation of the Pareto front.
diff --git a/mkdocs.yaml b/mkdocs.yaml
@@ -6,6 +6,7 @@ repo_url: https://github.com/experimental-design/bofire
 nav:
   - index.md
   - Install: install.md
+  - Notebook page: getting_started.ipynb
   - Package Architecture:
     - Data Models vs Functional Components: data_models_functionals.md
   - API Reference:
@@ -45,6 +46,7 @@ watch:
 
 plugins:
   - search
+  - mkdocs-jupyter
   # https://mkdocstrings.github.io/
   - mkdocstrings:
   # https://github.com/jimporter/mike

diff --git a/scripts/playground.py b/scripts/playground.py
diff --git a/scripts/run_tutorials.py b/scripts/run_tutorials.py
@@ -0,0 +1,140 @@
+from __future__ import annotations
+
+import argparse
+import os
+import subprocess
+import time
+from pathlib import Path
+from typing import Dict, Optional
+
+import pandas as pd
+
+# This script is based on the corresponding one from botorch: https://github.com/pytorch/botorch/blob/main/scripts/run_tutorials.py
+
+
+def run_script(
+    tutorial: Path, timeout_minutes: int = 20, env: Optional[Dict[str, str]] = None
+):
+    utils_path = {"PYTHONPATH": str(tutorial.parent)}
+    if env is not None:
+        env = {**os.environ, **env, **utils_path}
+    else:
+        env = {**os.environ, **utils_path}
+    try:
+        run_out = subprocess.run(
+            ["papermill", str(tutorial.absolute()), "temp.ipynb"],  # , "|"
+            capture_output=True,
+            text=True,
+            env=env,
+            encoding="utf-8",
+            timeout=timeout_minutes * 60,
+        )
+    except subprocess.TimeoutExpired:
+        print(f"{tutorial} exceeded max. runtime ({timeout_minutes*60} s)... ")
+        return None
+    return run_out
+
+
+def run_tutorials(
+    name: Optional[str] = None,
+    smoke_test=False,
+) -> None:
+    """
+    Run each tutorial, print statements on how it ran, and write a data set as a csv
+    to a directory.
+    """
+
+    timeout_minutes = 30 if smoke_test is False else 2
+
+    print(f"Running Tutorials, smoke_test_flag = {smoke_test}")
+
+    tutorial_dir = Path(os.getcwd()).joinpath("tutorials")
+    num_runs = 0
+    num_errors = 0
+
+    tutorials = sorted(t for t in tutorial_dir.rglob("*.ipynb") if t.is_file)
+    env = {"SMOKE_TEST": "True"} if smoke_test else None
+    if name is not None:
+        tutorials = [t for t in tutorials if t.name == name]
+        if len(tutorials) == 0:
+            raise RuntimeError(f"Specified tutorial {name} not found in directory.")
+
+    df = pd.DataFrame(
+        {
+            "name": [t.name for t in tutorials],
+            "ran_successfully": False,
+            "message": "",
+            "runtime": float("nan"),
+        }
+    ).set_index("name")
+
+    for tutorial in tutorials:
+        print(42 * "#", tutorial)
+        # # for now we skip all tutorials but the one for which we have implemented SMOKE_TEST. This will change soon!
+        # if str(tutorial).split("/")[-1] not in running_tutorials:
+        #     print("Skipping", str(tutorial))
+        #     continue
+        num_runs += 1
+        t1 = time.time()
+        run_out = run_script(tutorial, env=env, timeout_minutes=timeout_minutes)
+        elapsed_time = time.time() - t1
+        print(f"time elapsed:{elapsed_time:.2f}")
+        if run_out is None:  # in this case it bumped against max wall time
+            df.loc[tutorial.name, "ran_successfully"] = False
+            df.loc[tutorial.name, "message"] = "walltime exceeded"
+            continue
+        print(f"statuscode: {run_out.returncode}")
+
+        if run_out.returncode != 0:
+            num_errors += 1
+            df.loc[tutorial.name, "message"] = run_out.stderr
+            print(run_out.stderr)
+
+        else:
+            print(
+                f"Running tutorial {tutorial.name} took " f"{elapsed_time:.2f} seconds."
+            )
+            df.loc[tutorial.name, "ran_successfully"] = True
+
+    df.to_csv("notebook_test_stats.csv")
+
+    # delete temporary test notebook file
+    if os.path.exists("temp.ipynb"):
+        os.remove("temp.ipynb")
+
+    if num_errors > 0:
+        raise RuntimeError(
+            f"Running {num_runs} tutorials resulted in {num_errors} errors."
+        )
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Runs tutorials.")
+
+    parser.add_argument(
+        "-n",
+        "--name",
+        help="Run a specific tutorial by name.",
+    )
+
+    parser.add_argument(
+        "-l",
+        "--long",
+        action="store_true",
+        help="Run the full version of the notebook. Will take a long time",
+    )
+
+    # parser.add_argument(
+    #     "-s", "--smoke", action="store_true", help="Run in smoke test (quick) mode."
+    # )
+
+    parser.add_argument(
+        "-p", "--path", metavar="path", required=False, help="bofire repo directory."
+    )
+
+    args = parser.parse_args()
+
+    run_tutorials(
+        name=args.name,
+        smoke_test=not args.long,
+    )