Skip to content

Commit

Permalink
Feature/getting started (#105)
Browse files Browse the repository at this point in the history
  • Loading branch information
jkleinekorte authored May 5, 2023
1 parent 47981e5 commit 094eae1
Show file tree
Hide file tree
Showing 25 changed files with 14,121 additions and 485 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/docs-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@ jobs:
- name: Install dependencies
run: pip install .[optimization,cheminfo,docs]
- name: Build docs
run: mkdocs build
run: |
ls -l docs/*
mkdocs build
4 changes: 3 additions & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@ jobs:
- name: Install dependencies
run: |
pip install .[optimization,cheminfo,docs]
- run: mkdocs build
- name: Build docs
run: mkdocs build

- name: Deploy
uses: peaceiris/actions-gh-pages@v3
Expand Down
17 changes: 17 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,21 @@ jobs:
run: pip install cyipopt
- name: Run tests
run: pytest -ra --cov=bofire --cov-report term-missing tests


testing_tutorials:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install Bofire
run: pip install ".[optimization,tests,cheminfo]"
- name: Install ipopt
run: sudo apt install build-essential pkg-config coinor-libipopt1v5 coinor-libipopt-dev
- name: Install cyipopt
run: pip install cyipopt
- name: Run notebooks
run: python scripts/run_tutorials.py -p "$(pwd)"

18 changes: 9 additions & 9 deletions docs/data_models_functionals.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# Data Models vs. Functional Components

Data models in bofire hold static data of an optimization problem. These are input and output features as well as constraints making up the domain. They further include possible optimization objectives, acquisition functions, and kernels.
Data models in BoFire hold static data of an optimization problem. These are input and output features as well as constraints making up the domain. They further include possible optimization objectives, acquisition functions, and kernels.

All data models in ```bofire.data_models```, are specified as pydantic models and inherit from ```bofire.data_models.base.BaseModel```. These data models can be (de)serialized via ```.dict()``` and ```.json()``` (provided by pydantic). A json schema of each data model can be obtained using ```.schema()```.

For surrogates and strategies, all functional parts are located in ```bofire.surrogates``` and ```bofire.strategies```. These functionalities include the ```ask``` and ```tell``` as well as ```fit``` and ```predict``` methods. All class attributes (used by these method) are also removed from the data models. Each functional entity is initialized using the corresponding data model. As an example, consider the following data model of a ```RandomStrategy```:

```
```python
import bofire.data_models.domain.api as dm_domain
import bofire.data_models.features.api as dm_features
import bofire.data_models.strategies.api as dm_strategies

in1 = dm_features.ContinuousInput(key="in1", lower_bound=0.0, upper_bound=1.0)
in2 = dm_features.ContinuousInput(key="in2", lower_bound=0.0, upper_bound=2.0)
in3 = dm_features.ContinuousInput(key="in3", lower_bound=0.0, upper_bound=3.0)
in1 = dm_features.ContinuousInput(key="in1", bounds=(0.0,1.0))
in2 = dm_features.ContinuousInput(key="in2", bounds=(0.0,2.0))
in3 = dm_features.ContinuousInput(key="in3", bounds=(0.0,3.0))

out1 = dm_features.ContinuousOutput(key="out1")

Expand All @@ -32,7 +32,7 @@ data_model = dm_strategies.RandomStrategy(domain=domain)

Such a data model can be (de)serialized as follows:

```
```python
import json
from pydantic import parse_obj_as
from bofire.data_models.strategies.api import AnyStrategy
Expand All @@ -46,13 +46,13 @@ assert data_model_ == data_model

Using this data model of a strategy, we can create an instance of a (functional) strategy:

```
```python
import bofire.strategies.api as strategies
strategy = strategies.RandomStrategy(data_model=data_model)
```

As each strategy data model should be mapped to a specific (functional) strategy, we provide such a mapping:

```
```python
strategy = strategies.map(data_model)
```
```
1 change: 1 addition & 0 deletions docs/getting_started.ipynb
21 changes: 14 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# Introduction

BoFire is a framework to define and solve black-box optimization problems.
These problems can arise in a number of closely related fields including experimental design, multiobjective optimization and active learning.
These problems can arise in a number of closely related fields including experimental design, multi-objective optimization and active learning.

BoFire problem specifications are json serializable for use in RESTful APIs and are to a large extent agnostic to the specific methods and frameworks in which the problems are solved.

You can find code-examples in the Getting Started section of this document, as well as full worked-out examples of code-usage in the /tutorials section of this repository!

## Experimental design

In the context of experimental design BoFire allows to define a design space

$$
Expand All @@ -25,8 +28,9 @@ and a set of equations define additional experimental constraints, e.g.
* non-linear inequality: $\sum x_i^2 \leq 1$
* n-choose-k: only $k$ out of $n$ parameters can take non-zero values.

## Multiobjective optimization
In the context of multiobjective optimization BoFire allows to define a vector-valued optimization problem
## Multi-objective optimization

In the context of multi-objective optimization BoFire allows to define a vector-valued optimization problem

$$
\min_{x \in \mathbb{X}} s(y(x))
Expand All @@ -38,18 +42,21 @@ where
* $y = \{y_1, \ldots y_M\}$ are known functions describing your experimental outputs and
* $s = \{s_1, \ldots s_M\}$ are the objectives to be minimized, e.g. $s_1$ is the identity function if $y_1$ is to be minimized.

Since the objectives are in general conflicting, there is no point $x$ that simulataneously optimizes all objectives.
Since the objectives are in general conflicting, there is no point $x$ that simultaneously optimizes all objectives.
Instead the goal is to find the Pareto front of all optimal compromises.

A decision maker can then explore these compromises to get a deep understanding of the problem and make the best informed decision.

## Bayesian optimization

In the context of Bayesian optimization we want to simultaneously learn the unknown function $y(x)$ (exploration), while focusing the experimental effort on promising regions (exploitation).
This is done by using the experimental data to fit a probabilistic model $p(y|x, {data})$ that estimates the distribution of posible outcomes for $y$.
This is done by using the experimental data to fit a probabilistic model $p(y|x, {data})$ that estimates the distribution of possible outcomes for $y$.
An acquisition function $a$ then formulates the desired trade-off between exploration and exploitation

$$
\min_{x \in \mathbb{X}} a(s(p_y(x)))
$$

and the minimizer $x_\mathrm{opt}$ of this acquisition function. determines the next experiment $y(x)$ to run.
When are multiple competing objectives, the task is again to find a suitable approximation of the Pareto front.
and the minimizer $x_\mathrm{opt}$ of this acquisition function determines the next experiment $y(x)$ to run.

When there are multiple competing objectives, the task is again to find a suitable approximation of the Pareto front.
2 changes: 2 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ repo_url: https://github.com/experimental-design/bofire
nav:
- index.md
- Install: install.md
- Notebook page: getting_started.ipynb
- Package Architecture:
- Data Models vs Functional Components: data_models_functionals.md
- API Reference:
Expand Down Expand Up @@ -45,6 +46,7 @@ watch:

plugins:
- search
- mkdocs-jupyter
# https://mkdocstrings.github.io/
- mkdocstrings:
# https://github.com/jimporter/mike
Expand Down
76 changes: 0 additions & 76 deletions scripts/playground.py

This file was deleted.

140 changes: 140 additions & 0 deletions scripts/run_tutorials.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
from __future__ import annotations

import argparse
import os
import subprocess
import time
from pathlib import Path
from typing import Dict, Optional

import pandas as pd

# This script is based on the corresponding one from botorch: https://github.com/pytorch/botorch/blob/main/scripts/run_tutorials.py


def run_script(
tutorial: Path, timeout_minutes: int = 20, env: Optional[Dict[str, str]] = None
):
utils_path = {"PYTHONPATH": str(tutorial.parent)}
if env is not None:
env = {**os.environ, **env, **utils_path}
else:
env = {**os.environ, **utils_path}
try:
run_out = subprocess.run(
["papermill", str(tutorial.absolute()), "temp.ipynb"], # , "|"
capture_output=True,
text=True,
env=env,
encoding="utf-8",
timeout=timeout_minutes * 60,
)
except subprocess.TimeoutExpired:
print(f"{tutorial} exceeded max. runtime ({timeout_minutes*60} s)... ")
return None
return run_out


def run_tutorials(
name: Optional[str] = None,
smoke_test=False,
) -> None:
"""
Run each tutorial, print statements on how it ran, and write a data set as a csv
to a directory.
"""

timeout_minutes = 30 if smoke_test is False else 2

print(f"Running Tutorials, smoke_test_flag = {smoke_test}")

tutorial_dir = Path(os.getcwd()).joinpath("tutorials")
num_runs = 0
num_errors = 0

tutorials = sorted(t for t in tutorial_dir.rglob("*.ipynb") if t.is_file)
env = {"SMOKE_TEST": "True"} if smoke_test else None
if name is not None:
tutorials = [t for t in tutorials if t.name == name]
if len(tutorials) == 0:
raise RuntimeError(f"Specified tutorial {name} not found in directory.")

df = pd.DataFrame(
{
"name": [t.name for t in tutorials],
"ran_successfully": False,
"message": "",
"runtime": float("nan"),
}
).set_index("name")

for tutorial in tutorials:
print(42 * "#", tutorial)
# # for now we skip all tutorials but the one for which we have implemented SMOKE_TEST. This will change soon!
# if str(tutorial).split("/")[-1] not in running_tutorials:
# print("Skipping", str(tutorial))
# continue
num_runs += 1
t1 = time.time()
run_out = run_script(tutorial, env=env, timeout_minutes=timeout_minutes)
elapsed_time = time.time() - t1
print(f"time elapsed:{elapsed_time:.2f}")
if run_out is None: # in this case it bumped against max wall time
df.loc[tutorial.name, "ran_successfully"] = False
df.loc[tutorial.name, "message"] = "walltime exceeded"
continue
print(f"statuscode: {run_out.returncode}")

if run_out.returncode != 0:
num_errors += 1
df.loc[tutorial.name, "message"] = run_out.stderr
print(run_out.stderr)

else:
print(
f"Running tutorial {tutorial.name} took " f"{elapsed_time:.2f} seconds."
)
df.loc[tutorial.name, "ran_successfully"] = True

df.to_csv("notebook_test_stats.csv")

# delete temporary test notebook file
if os.path.exists("temp.ipynb"):
os.remove("temp.ipynb")

if num_errors > 0:
raise RuntimeError(
f"Running {num_runs} tutorials resulted in {num_errors} errors."
)


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Runs tutorials.")

parser.add_argument(
"-n",
"--name",
help="Run a specific tutorial by name.",
)

parser.add_argument(
"-l",
"--long",
action="store_true",
help="Run the full version of the notebook. Will take a long time",
)

# parser.add_argument(
# "-s", "--smoke", action="store_true", help="Run in smoke test (quick) mode."
# )

parser.add_argument(
"-p", "--path", metavar="path", required=False, help="bofire repo directory."
)

args = parser.parse_args()

run_tutorials(
name=args.name,
smoke_test=not args.long,
)
Loading

0 comments on commit 094eae1

Please sign in to comment.