🔀 MERGE: Improve Notebook Execution (#236)

1. Standardise auto/cache execution Both now call the same underlying function (from jupyter-cache) and act the same. This improves auto, by making it output error reports and not raising an exception on an error. Additional config has also been added: `execution_allow_errors` and `execution_in_temp`. Like for timeout, `allow_errors` can also be set in the notebook `metadata.execution.allow_errors` This presents one breaking change, in that `auto` will now by default execute in a temporary folder as the cwd. (we could set temp to False by default, but I think this is safer?) 2. For both methods, executions data is captured into: ```python env.nb_execution_data[env.docname] = { "mtime": datetime.datetime.utcnow().isoformat(), "runtime": runtime, "method": execution_method, "succeeded": succeeded, } ``` and a directive `nb-exec-table` has been added, to create a table of these results.
executablebooks · Aug 20, 2020 · 2bc0c11 · 2bc0c11
2 parents f98fa54 + d186389
commit 2bc0c11
Show file tree

Hide file tree

Showing 27 changed files with 634 additions and 95 deletions.
diff --git a/docs/use/execute.md b/docs/use/execute.md
@@ -16,104 +16,100 @@ kernelspec:
 # Executing and cacheing your content
 
 MyST-NB can automatically run and cache notebooks contained in your project using [jupyter-cache].
-Notebooks can either be run each time the documentation is built, or cached
-locally so that re-runs occur only when code cells have changed.
+Notebooks can either be run each time the documentation is built, or cached locally so that re-runs occur only when code cells have changed.
 
-Cacheing behavior is controlled with configuration in your `conf.py` file. See
-the sections below for each configuration option and its effect.
+Caching behaviour is controlled with configuration in your `conf.py` file.
+See the sections below for each configuration option and its effect.
 
 (execute/config)=
 
 ## Triggering notebook execution
 
-To trigger the execution of notebook pages, use the following configuration in `conf.py`
+To trigger the execution of notebook pages, use the following configuration in `conf.py`:
 
-```
+```python
 jupyter_execute_notebooks = "auto"
 ```
 
-By default, this will only execute notebooks that are missing at least one output. If
-a notebook has *all* of its outputs populated, then it will not be executed.
+By default, this will only execute notebooks that are missing at least one output.
+If a notebook has *all* of its outputs populated, then it will not be executed.
 
-**To force the execution of all notebooks, regardless of their outputs**, change the
-above configuration value to:
+**To force the execution of all notebooks, regardless of their outputs**, change the above configuration value to:
 
-```
+```python
 jupyter_execute_notebooks = "force"
 ```
 
-**To cache execution outputs with [jupyter-cache]**, change the above configuration
-value to:
+**To cache execution outputs with [jupyter-cache]**, change the above configuration value to:
 
-```
+```python
 jupyter_execute_notebooks = "cache"
 ```
 
 See {ref}`execute/cache` for more information.
 
-**To turn off notebook execution**, change the
-above configuration value to:
+**To turn off notebook execution**, change the above configuration value to:
 
-```
+```python
 jupyter_execute_notebooks = "off"
 ```
 
-**To exclude certain file patterns from execution**, use the following
-configuration:
+**To exclude certain file patterns from execution**, use the following configuration:
 
-```
+```python
 execution_excludepatterns = ['list', 'of', '*patterns']
 ```
 
-Any file that matches one of the items in `execution_excludepatterns` will not be
-executed.
+Any file that matches one of the items in `execution_excludepatterns` will not be executed.
 
 (execute/cache)=
 ## Cacheing the notebook execution
 
-As mentioned above, you can **cache the results of executing a notebook page** by setting
+As mentioned above, you can **cache the results of executing a notebook page** by setting:
 
-```
+```python
 jupyter_execute_notebooks = "cache"
 ```
 
-in your conf.py file.   In this case, when a page is executed, its outputs
-will be stored in a local database.  This allows you to be sure that the
-outputs in your documentation are up-to-date, while saving time avoiding
-unnecessary re-execution. It also allows you to store your `.ipynb` files in
-your `git` repository *without their outputs*, but still leverage a cache to
-save time when building your site.
+in your conf.py file.
+
+In this case, when a page is executed, its outputs will be stored in a local database.
+This allows you to be sure that the outputs in your documentation are up-to-date, while saving time avoiding unnecessary re-execution.
+It also allows you to store your `.ipynb` files (or their `.md` equivalent) in your `git` repository *without their outputs*, but still leverage a cache to save time when building your site.
 
 When you re-build your site, the following will happen:
 
-* Notebooks that have not seen changes to their **code cells** since the last build
-  will not be re-executed. Instead, their outputs will be pulled from the cache
-  and inserted into your site.
-* Notebooks that **have any change to their code cells** will be re-executed, and the
-  cache will be updated with the new outputs.
+* Notebooks that have not seen changes to their **code cells** or **metadata** since the last build will not be re-executed.
+  Instead, their outputs will be pulled from the cache and inserted into your site.
+* Notebooks that **have any change to their code cells** will be re-executed, and the cache will be updated with the new outputs.
 
-By default, the cache will be placed in the parent of your build folder. Generally,
-this is in `_build/.jupyter_cache`.
+By default, the cache will be placed in the parent of your build folder.
+Generally, this is in `_build/.jupyter_cache`.
 
 You may also specify a path to the location of a jupyter cache you'd like to use:
 
-```
-jupyter_cache = path/to/mycache
+```python
+jupyter_cache = "path/to/mycache"
 ```
 
-The path should point to an **empty folder**, or a folder where a
-**jupyter cache already exists**.
+The path should point to an **empty folder**, or a folder where a **jupyter cache already exists**.
 
 [jupyter-cache]: https://github.com/executablebooks/jupyter-cache "the Jupyter Cache Project"
 
+## Executing in temporary folders
+
+By default, the command working directory (cwd) that a notebook runs in will be its parent directory.
+However, you can set `execution_in_temp=True` in your `conf.py`, to change this behaviour such that, for each execution, a temporary directory will be created and used as the cwd.
+
 (execute/timeout)=
 ## Execution Timeout
 
 The execution of notebooks is managed by {doc}`nbclient <nbclient:client>`.
 
-The `execution_timeout` sphinx option defines the maximum time (in seconds) each notebook cell is allowed to run, if the execution takes longer an exception will be raised.
+The `execution_timeout` sphinx option defines the maximum time (in seconds) each notebook cell is allowed to run.
+if the execution takes longer an exception will be raised.
 The default is 30 s, so in cases of long-running cells you may want to specify an higher value.
-The timeout option can also be set to None or -1 to remove any restriction on execution time.
+The timeout option can also be set to `None` or -1 to remove any restriction on execution time.
 
 This global value can also be overridden per notebook by adding this to you notebooks metadata:
 
@@ -126,19 +122,32 @@ This global value can also be overridden per notebook by adding this to you note
 }
 ```
 
-## Execution FAQs
+(execute/allow_errors)=
+## Dealing with code that raises errors
 
-### How can I include code that raises errors?
+In some cases, you may want to intentionally show code that doesn't work (e.g., to show the error message).
+You can achieve this at "three levels":
 
-In some cases, you may want to intentionally show code that doesn't work (e.g., to show
-the error message). To do this, add a `raises-exception` tag to your code cell. This
-can be done via a Jupyter interface, or via the `{code-cell}` directive like so:
+Globally, by setting `execution_allow_errors=True` in your `conf.py`.
 
-````
+Per notebook (overrides global), by adding this to you notebooks metadata:
+
+```json
+{
+"metadata": {
+  "execution": {
+      "allow_errors": true
+  }
+}
+```
+
+Per cell, by adding a `raises-exception` tag to your code cell.
+This can be done via a Jupyter interface, or via the `{code-cell}` directive like so:
+
+````md
 ```{code-cell}
----
-tags: [raises-exception]
----
+:tags: [raises-exception]
+
 print(thisvariabledoesntexist)
 ```
 ````
@@ -151,3 +160,20 @@ tags: [raises-exception]
 ---
 print(thisvariabledoesntexist)
 ```
+
+(execute/statistics)=
+## Execution Statistics
+
+As notebooks are executed, certain statistics are stored in a dictionary (`{docname:data}`), and saved on the [sphinx environment object](https://www.sphinx-doc.org/en/master/extdev/envapi.html#sphinx.environment.BuildEnvironment) as `env.nb_execution_data`.
+
+You can access this in a post-transform in your own sphinx extensions, or use the built-in `nb-exec-table` directive:
+
+````md
+```{nb-exec-table}
+```
+````
+
+which produces:
+
+```{nb-exec-table}
+```
diff --git a/docs/use/start.md b/docs/use/start.md
@@ -52,10 +52,17 @@ MyST-NB then adds some additional configuration, specific to notebooks:
 * - `jupyter_execute_notebooks`
   - "auto"
   - The logic for executing notebooks, [see here](execute/config) for details.
+* - `execution_in_temp`
+  - `False`
+  - If `True`, then a temporary directory will be created and used as the command working directory (cwd), if `False` then the notebook's parent directory will be the cwd.
+* - `execution_allow_errors`
+  - `False`
+  - If `False`, when a code cell raises an error the execution is stopped, if `True` then all cells are always run.
+    This can also be overridden by metadata in a notebook, [see here](execute/allow_errors) for details.
 * - `execution_timeout`
   - 30
   - The maximum time (in seconds) each notebook cell is allowed to run.
-    This can be overridden by metadata in a notebook, [see here](execute/timeout) for detail.
+    This can also be overridden by metadata in a notebook, [see here](execute/timeout) for details.
 * - `execution_show_tb`
   - `False`
   - Show failed notebook tracebacks in stdout (in addition to writing to file).

diff --git a/myst_nb/__init__.py b/myst_nb/__init__.py
@@ -16,7 +16,7 @@
     JupyterCell,
 )
 
-from .cache import update_execution_cache
+from .execution import update_execution_cache
 from .parser import (
     NotebookParser,
     CellNode,
@@ -35,6 +35,7 @@
     PasteInlineNode,
 )
 from .nb_glue.transform import PasteNodesToDocutils
+from .exec_table import setup_exec_table
 
 LOGGER = logging.getLogger(__name__)
 
@@ -104,6 +105,8 @@ def visit_element_html(self, node):
     app.add_config_value("execution_excludepatterns", [], "env")
     app.add_config_value("jupyter_execute_notebooks", "auto", "env")
     app.add_config_value("execution_timeout", 30, "env")
+    app.add_config_value("execution_allow_errors", False, "env")
+    app.add_config_value("execution_in_temp", False, "env")
     # show traceback in stdout (in addition to writing to file)
     # this is useful in e.g. RTD where one cannot inspect a file
     app.add_config_value("execution_show_tb", False, "")
@@ -130,6 +133,9 @@ def visit_element_html(self, node):
     app.add_domain(NbGlueDomain)
     app.add_directive("code-cell", CodeCell)
 
+    # execution statistics table
+    setup_exec_table(app)
+
     # TODO need to deal with key clashes in NbGlueDomain.merge_domaindata
     # before this is parallel_read_safe
     return {"version": __version__, "parallel_read_safe": False}
@@ -178,6 +184,8 @@ def set_valid_execution_paths(app):
         for suffix, parser_type in app.config["source_suffix"].items()
         if parser_type in ("myst-nb",)
     }
+    if not hasattr(app.env, "nb_execution_data"):
+        app.env.nb_execution_data = {}
 
 
 def add_exclude_patterns(app, config):

diff --git a/myst_nb/exec_table.py b/myst_nb/exec_table.py
@@ -0,0 +1,94 @@
+"""A directive to create a table of executed notebooks, and related statistics."""
+from datetime import datetime
+
+from docutils import nodes
+from sphinx.transforms.post_transforms import SphinxPostTransform
+from sphinx.util.docutils import SphinxDirective
+
+
+def setup_exec_table(app):
+    """execution statistics table."""
+    app.add_node(ExecutionStatsNode)
+    app.add_directive("nb-exec-table", ExecutionStatsTable)
+    app.add_post_transform(ExecutionStatsPostTransform)
+
+
+class ExecutionStatsNode(nodes.General, nodes.Element):
+    """A placeholder node, for adding a notebook execution statistics table."""
+
+
+class ExecutionStatsTable(SphinxDirective):
+    """Add a notebook execution statistics table."""
+
+    has_content = True
+    final_argument_whitespace = True
+
+    def run(self):
+
+        return [ExecutionStatsNode()]
+
+
+class ExecutionStatsPostTransform(SphinxPostTransform):
+    """Replace the placeholder node with the final table."""
+
+    default_priority = 400
+
+    def run(self, **kwargs) -> None:
+        for node in self.document.traverse(ExecutionStatsNode):
+            node.replace_self(make_stat_table(self.env.nb_execution_data))
+
+
+def make_stat_table(nb_execution_data):
+
+    key2header = {
+        "mtime": "Modified",
+        "method": "Method",
+        "runtime": "Run Time (s)",
+        "succeeded": "Status",
+    }
+
+    key2transform = {
+        "mtime": lambda x: datetime.fromtimestamp(x).strftime("%Y-%m-%d %H:%M")
+        if x
+        else "",
+        "method": str,
+        "runtime": lambda x: "-" if x is None else str(round(x, 2)),
+        "succeeded": lambda x: "✅" if x is True else "❌",
+    }
+
+    # top-level element
+    table = nodes.table()
+    table["classes"] += ["colwidths-auto"]
+    # self.set_source_info(table)
+
+    # column settings element
+    ncols = len(key2header) + 1
+    tgroup = nodes.tgroup(cols=ncols)
+    table += tgroup
+    colwidths = [round(100 / ncols, 2)] * ncols
+    for colwidth in colwidths:
+        colspec = nodes.colspec(colwidth=colwidth)
+        tgroup += colspec
+
+    # header
+    thead = nodes.thead()
+    tgroup += thead
+    row = nodes.row()
+    thead += row
+
+    for name in ["Document"] + list(key2header.values()):
+        row.append(nodes.entry("", nodes.paragraph(text=name)))
+
+    # body
+    tbody = nodes.tbody()
+    tgroup += tbody
+
+    for doc, data in nb_execution_data.items():
+        row = nodes.row()
+        tbody += row
+        row.append(nodes.entry("", nodes.paragraph(text=doc)))
+        for name in key2header.keys():
+            text = key2transform[name](data[name])
+            row.append(nodes.entry("", nodes.paragraph(text=text)))
+
+    return table