Skip to content

Commit

Permalink
πŸ”€ MERGE: Improve Notebook Execution (#236)
Browse files Browse the repository at this point in the history
1. Standardise auto/cache execution

Both now call the same underlying function (from jupyter-cache) and act the same.
This improves auto, by making it output error reports and not raising an exception on an error.

Additional config has also been added: `execution_allow_errors` and `execution_in_temp`.

Like for timeout, `allow_errors` can also be set in the notebook `metadata.execution.allow_errors`

This presents one breaking change, in that `auto` will now by default execute in a temporary folder as the cwd. (we could set temp to False by default, but I think this is safer?)

2. For both methods, executions data is captured into:

```python
env.nb_execution_data[env.docname] = {
        "mtime": datetime.datetime.utcnow().isoformat(),
        "runtime": runtime,
        "method": execution_method,
        "succeeded": succeeded,
    }
```

and a directive `nb-exec-table` has been added, to create a table of these results.
  • Loading branch information
chrisjsewell authored Aug 20, 2020
2 parents f98fa54 + d186389 commit 2bc0c11
Show file tree
Hide file tree
Showing 27 changed files with 634 additions and 95 deletions.
130 changes: 78 additions & 52 deletions docs/use/execute.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,104 +16,100 @@ kernelspec:
# Executing and cacheing your content

MyST-NB can automatically run and cache notebooks contained in your project using [jupyter-cache].
Notebooks can either be run each time the documentation is built, or cached
locally so that re-runs occur only when code cells have changed.
Notebooks can either be run each time the documentation is built, or cached locally so that re-runs occur only when code cells have changed.

Cacheing behavior is controlled with configuration in your `conf.py` file. See
the sections below for each configuration option and its effect.
Caching behaviour is controlled with configuration in your `conf.py` file.
See the sections below for each configuration option and its effect.

(execute/config)=

## Triggering notebook execution

To trigger the execution of notebook pages, use the following configuration in `conf.py`
To trigger the execution of notebook pages, use the following configuration in `conf.py`:

```
```python
jupyter_execute_notebooks = "auto"
```

By default, this will only execute notebooks that are missing at least one output. If
a notebook has *all* of its outputs populated, then it will not be executed.
By default, this will only execute notebooks that are missing at least one output.
If a notebook has *all* of its outputs populated, then it will not be executed.

**To force the execution of all notebooks, regardless of their outputs**, change the
above configuration value to:
**To force the execution of all notebooks, regardless of their outputs**, change the above configuration value to:

```
```python
jupyter_execute_notebooks = "force"
```

**To cache execution outputs with [jupyter-cache]**, change the above configuration
value to:
**To cache execution outputs with [jupyter-cache]**, change the above configuration value to:

```
```python
jupyter_execute_notebooks = "cache"
```

See {ref}`execute/cache` for more information.

**To turn off notebook execution**, change the
above configuration value to:
**To turn off notebook execution**, change the above configuration value to:

```
```python
jupyter_execute_notebooks = "off"
```

**To exclude certain file patterns from execution**, use the following
configuration:
**To exclude certain file patterns from execution**, use the following configuration:

```
```python
execution_excludepatterns = ['list', 'of', '*patterns']
```

Any file that matches one of the items in `execution_excludepatterns` will not be
executed.
Any file that matches one of the items in `execution_excludepatterns` will not be executed.

(execute/cache)=
## Cacheing the notebook execution

As mentioned above, you can **cache the results of executing a notebook page** by setting
As mentioned above, you can **cache the results of executing a notebook page** by setting:

```
```python
jupyter_execute_notebooks = "cache"
```

in your conf.py file. In this case, when a page is executed, its outputs
will be stored in a local database. This allows you to be sure that the
outputs in your documentation are up-to-date, while saving time avoiding
unnecessary re-execution. It also allows you to store your `.ipynb` files in
your `git` repository *without their outputs*, but still leverage a cache to
save time when building your site.
in your conf.py file.

In this case, when a page is executed, its outputs will be stored in a local database.
This allows you to be sure that the outputs in your documentation are up-to-date, while saving time avoiding unnecessary re-execution.
It also allows you to store your `.ipynb` files (or their `.md` equivalent) in your `git` repository *without their outputs*, but still leverage a cache to save time when building your site.

When you re-build your site, the following will happen:

* Notebooks that have not seen changes to their **code cells** since the last build
will not be re-executed. Instead, their outputs will be pulled from the cache
and inserted into your site.
* Notebooks that **have any change to their code cells** will be re-executed, and the
cache will be updated with the new outputs.
* Notebooks that have not seen changes to their **code cells** or **metadata** since the last build will not be re-executed.
Instead, their outputs will be pulled from the cache and inserted into your site.
* Notebooks that **have any change to their code cells** will be re-executed, and the cache will be updated with the new outputs.

By default, the cache will be placed in the parent of your build folder. Generally,
this is in `_build/.jupyter_cache`.
By default, the cache will be placed in the parent of your build folder.
Generally, this is in `_build/.jupyter_cache`.

You may also specify a path to the location of a jupyter cache you'd like to use:

```
jupyter_cache = path/to/mycache
```python
jupyter_cache = "path/to/mycache"
```

The path should point to an **empty folder**, or a folder where a
**jupyter cache already exists**.
The path should point to an **empty folder**, or a folder where a **jupyter cache already exists**.

[jupyter-cache]: https://github.com/executablebooks/jupyter-cache "the Jupyter Cache Project"

## Executing in temporary folders

By default, the command working directory (cwd) that a notebook runs in will be its parent directory.
However, you can set `execution_in_temp=True` in your `conf.py`, to change this behaviour such that, for each execution, a temporary directory will be created and used as the cwd.

(execute/timeout)=
## Execution Timeout

The execution of notebooks is managed by {doc}`nbclient <nbclient:client>`.

The `execution_timeout` sphinx option defines the maximum time (in seconds) each notebook cell is allowed to run, if the execution takes longer an exception will be raised.
The `execution_timeout` sphinx option defines the maximum time (in seconds) each notebook cell is allowed to run.
if the execution takes longer an exception will be raised.
The default is 30 s, so in cases of long-running cells you may want to specify an higher value.
The timeout option can also be set to None or -1 to remove any restriction on execution time.
The timeout option can also be set to `None` or -1 to remove any restriction on execution time.

This global value can also be overridden per notebook by adding this to you notebooks metadata:

Expand All @@ -126,19 +122,32 @@ This global value can also be overridden per notebook by adding this to you note
}
```

## Execution FAQs
(execute/allow_errors)=
## Dealing with code that raises errors

### How can I include code that raises errors?
In some cases, you may want to intentionally show code that doesn't work (e.g., to show the error message).
You can achieve this at "three levels":

In some cases, you may want to intentionally show code that doesn't work (e.g., to show
the error message). To do this, add a `raises-exception` tag to your code cell. This
can be done via a Jupyter interface, or via the `{code-cell}` directive like so:
Globally, by setting `execution_allow_errors=True` in your `conf.py`.

````
Per notebook (overrides global), by adding this to you notebooks metadata:

```json
{
"metadata": {
"execution": {
"allow_errors": true
}
}
```

Per cell, by adding a `raises-exception` tag to your code cell.
This can be done via a Jupyter interface, or via the `{code-cell}` directive like so:

````md
```{code-cell}
---
tags: [raises-exception]
---
:tags: [raises-exception]

print(thisvariabledoesntexist)
```
````
Expand All @@ -151,3 +160,20 @@ tags: [raises-exception]
---
print(thisvariabledoesntexist)
```
(execute/statistics)=
## Execution Statistics
As notebooks are executed, certain statistics are stored in a dictionary (`{docname:data}`), and saved on the [sphinx environment object](https://www.sphinx-doc.org/en/master/extdev/envapi.html#sphinx.environment.BuildEnvironment) as `env.nb_execution_data`.
You can access this in a post-transform in your own sphinx extensions, or use the built-in `nb-exec-table` directive:
````md
```{nb-exec-table}
```
````

which produces:

```{nb-exec-table}
```
9 changes: 8 additions & 1 deletion docs/use/start.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,17 @@ MyST-NB then adds some additional configuration, specific to notebooks:
* - `jupyter_execute_notebooks`
- "auto"
- The logic for executing notebooks, [see here](execute/config) for details.
* - `execution_in_temp`
- `False`
- If `True`, then a temporary directory will be created and used as the command working directory (cwd), if `False` then the notebook's parent directory will be the cwd.
* - `execution_allow_errors`
- `False`
- If `False`, when a code cell raises an error the execution is stopped, if `True` then all cells are always run.
This can also be overridden by metadata in a notebook, [see here](execute/allow_errors) for details.
* - `execution_timeout`
- 30
- The maximum time (in seconds) each notebook cell is allowed to run.
This can be overridden by metadata in a notebook, [see here](execute/timeout) for detail.
This can also be overridden by metadata in a notebook, [see here](execute/timeout) for details.
* - `execution_show_tb`
- `False`
- Show failed notebook tracebacks in stdout (in addition to writing to file).
Expand Down
10 changes: 9 additions & 1 deletion myst_nb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
JupyterCell,
)

from .cache import update_execution_cache
from .execution import update_execution_cache
from .parser import (
NotebookParser,
CellNode,
Expand All @@ -35,6 +35,7 @@
PasteInlineNode,
)
from .nb_glue.transform import PasteNodesToDocutils
from .exec_table import setup_exec_table

LOGGER = logging.getLogger(__name__)

Expand Down Expand Up @@ -104,6 +105,8 @@ def visit_element_html(self, node):
app.add_config_value("execution_excludepatterns", [], "env")
app.add_config_value("jupyter_execute_notebooks", "auto", "env")
app.add_config_value("execution_timeout", 30, "env")
app.add_config_value("execution_allow_errors", False, "env")
app.add_config_value("execution_in_temp", False, "env")
# show traceback in stdout (in addition to writing to file)
# this is useful in e.g. RTD where one cannot inspect a file
app.add_config_value("execution_show_tb", False, "")
Expand All @@ -130,6 +133,9 @@ def visit_element_html(self, node):
app.add_domain(NbGlueDomain)
app.add_directive("code-cell", CodeCell)

# execution statistics table
setup_exec_table(app)

# TODO need to deal with key clashes in NbGlueDomain.merge_domaindata
# before this is parallel_read_safe
return {"version": __version__, "parallel_read_safe": False}
Expand Down Expand Up @@ -178,6 +184,8 @@ def set_valid_execution_paths(app):
for suffix, parser_type in app.config["source_suffix"].items()
if parser_type in ("myst-nb",)
}
if not hasattr(app.env, "nb_execution_data"):
app.env.nb_execution_data = {}


def add_exclude_patterns(app, config):
Expand Down
94 changes: 94 additions & 0 deletions myst_nb/exec_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
"""A directive to create a table of executed notebooks, and related statistics."""
from datetime import datetime

from docutils import nodes
from sphinx.transforms.post_transforms import SphinxPostTransform
from sphinx.util.docutils import SphinxDirective


def setup_exec_table(app):
"""execution statistics table."""
app.add_node(ExecutionStatsNode)
app.add_directive("nb-exec-table", ExecutionStatsTable)
app.add_post_transform(ExecutionStatsPostTransform)


class ExecutionStatsNode(nodes.General, nodes.Element):
"""A placeholder node, for adding a notebook execution statistics table."""


class ExecutionStatsTable(SphinxDirective):
"""Add a notebook execution statistics table."""

has_content = True
final_argument_whitespace = True

def run(self):

return [ExecutionStatsNode()]


class ExecutionStatsPostTransform(SphinxPostTransform):
"""Replace the placeholder node with the final table."""

default_priority = 400

def run(self, **kwargs) -> None:
for node in self.document.traverse(ExecutionStatsNode):
node.replace_self(make_stat_table(self.env.nb_execution_data))


def make_stat_table(nb_execution_data):

key2header = {
"mtime": "Modified",
"method": "Method",
"runtime": "Run Time (s)",
"succeeded": "Status",
}

key2transform = {
"mtime": lambda x: datetime.fromtimestamp(x).strftime("%Y-%m-%d %H:%M")
if x
else "",
"method": str,
"runtime": lambda x: "-" if x is None else str(round(x, 2)),
"succeeded": lambda x: "βœ…" if x is True else "❌",
}

# top-level element
table = nodes.table()
table["classes"] += ["colwidths-auto"]
# self.set_source_info(table)

# column settings element
ncols = len(key2header) + 1
tgroup = nodes.tgroup(cols=ncols)
table += tgroup
colwidths = [round(100 / ncols, 2)] * ncols
for colwidth in colwidths:
colspec = nodes.colspec(colwidth=colwidth)
tgroup += colspec

# header
thead = nodes.thead()
tgroup += thead
row = nodes.row()
thead += row

for name in ["Document"] + list(key2header.values()):
row.append(nodes.entry("", nodes.paragraph(text=name)))

# body
tbody = nodes.tbody()
tgroup += tbody

for doc, data in nb_execution_data.items():
row = nodes.row()
tbody += row
row.append(nodes.entry("", nodes.paragraph(text=doc)))
for name in key2header.keys():
text = key2transform[name](data[name])
row.append(nodes.entry("", nodes.paragraph(text=text)))

return table
Loading

0 comments on commit 2bc0c11

Please sign in to comment.