Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👌 IMPROVE: Notebook Execution #236

Merged
merged 7 commits into from
Aug 20, 2020
Merged

👌 IMPROVE: Notebook Execution #236

merged 7 commits into from
Aug 20, 2020

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Aug 20, 2020

  1. Standardise auto/cache execution

Both now call the same underlying function (from jupyter-cache) and act the same.
This improves auto, by making it output error reports and not raising an exception on an error.

Additional config has also been added: execution_allow_errors and execution_in_temp.

Like for timeout, allow_errors can also be set in the notebook metadata.execution.allow_errors

This presents one breaking change, in that auto will now by default execute in a temporary folder as the cwd. (we could set temp to False by default, but I think this is safer?)

  1. For both methods, executions data is captured into:
env.nb_execution_data[env.docname] = {
        "mtime": datetime.datetime.utcnow().isoformat(),
        "runtime": runtime,
        "method": execution_method,
        "succeeded": succeeded,
    }

This (almost) closes executablebooks/jupyter-cache#56

Both now call the same underlying function (from jupyter-cache) and act the same.
This improves auto, by making it output error reports and not raising an exception on an error.
Additional config has also been added: `execution_allow_errors` and `execution_in_temp`.
Like for timeout, `allow_errors` can also be set in the notebook metadata.execution.allow_errors
This presents one breaking change, in that `auto` will now by default execute in a temporary folder as the cwd.
@chrisjsewell chrisjsewell changed the title 👌 IMPROVE: Standardise auto/cache execution 👌 IMPROVE: Notebook Execution Aug 20, 2020
@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #236 into master will increase coverage by 0.61%.
The diff coverage is 93.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #236      +/-   ##
==========================================
+ Coverage   85.37%   85.99%   +0.61%     
==========================================
  Files           9       10       +1     
  Lines         841      921      +80     
==========================================
+ Hits          718      792      +74     
- Misses        123      129       +6     
Flag Coverage Δ
#pytests 85.99% <93.00%> (+0.61%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
myst_nb/execution.py 79.25% <84.44%> (ø)
myst_nb/__init__.py 90.21% <100.00%> (+0.68%) ⬆️
myst_nb/exec_table.py 100.00% <100.00%> (ø)
myst_nb/parser.py 95.97% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f98fa54...d186389. Read the comment docs.

This import is to mitigate errors on CI VMs, where you can get the message: "Matplotlib is building the font cache"
@chrisjsewell chrisjsewell marked this pull request as ready for review August 20, 2020 03:43
@chrisjsewell
Copy link
Member Author

Just got to document it also...

Copy link
Member

@mmcky mmcky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisjsewell some initial general point for discussion.

Option Names:

With option names do you think we should adopt a convention such as {ext}_{option} where {ext} is an indication of which extension the option belongs to. My only concern is option names could get long with this approach but it may help to avoid name conflicts in the option namespace in the future?

Report:

Looks like execution stats include mtime, runtime, method, and succeeded added to env.nb_execution_data[env.docname]. If succeeded is False could we also save a path/reference to the generated error log file? This might be useful for opening traceback information based on context.

@chrisjsewell
Copy link
Member Author

With option names do you think we should adopt a convention such as {ext}_{option}

Yeh in myst-parser all the config start with myst_: https://myst-parser.readthedocs.io/en/latest/using/intro.html#myst-configuration-options, which is literally programmed in as you suggest: https://github.com/executablebooks/MyST-Parser/blob/8206f2a6eb891737b0b7916b761d6a5311a08de4/myst_parser/__init__.py#L41-L43, and all configs are actually available in the one MdParserConfig object, which is stored on app.env.myst_config

Could do something similar here, but I was not sure if it was a good idea to break back compatibility.

If we did want to change, probably we should keep both name versions for a time, and somehow log warnings about one being deprecated

My only concern is option names could get long

I'd say better to be long and specific, than have conflicts.

If succeeded is False could we also save a path/reference to the generated error log file?

Sounds good to me 👍

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 20, 2020

If succeeded is False could we also save a path/reference to the generated error log file?

Added in 8534c2c

@AakashGfude
Copy link
Member

Awesome @chrisjsewell . A few pointers from my side.

  • Some of the statistics mentioned it seems are not captured like num_of_errors, language, extension ? Also, it seems like a particular notebooks execution stops after it encounters the first cell with error. Then it jumps to the next notebook. We will need to have all the errors in a notebook captured for statistics?

  • The functionality of auto which differs it from cache now is generating coverage reports? should we have a different name for it then specifying its purpose?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 20, 2020

Ok @mmcky @AakashGfude check this out! https://myst-nb--236.org.readthedocs.build/en/236/use/execute.html#execution-statistics

Some of the statistics mentioned it seems are not captured like num_of_errors, language, extension

I think maybe it would be better to have a customizable function, for extracting additional data from the notebook and adding it to nb_execution_data, to cater for different use cases

We will need to have all the errors in a notebook captured for statistics?

and that is what the execution_allow_errors configuration is for 😄

The functionality of auto which differs it from cache now is generating coverage reports?

Not sure what you mean by this? coverage reporting how?

should we have a different name for it then specifying its purpose?

as with changing the names of the current configuration values (see above), I'm hesitant to change them, since it means having to implement some kind of deprecation process, updating the documentation both here and in jupyter-book, and dealing with lots of people raising issues about why their books no longer build 😬

@chrisjsewell
Copy link
Member Author

The difference between auto and cache, is that cache does all its execution "at once" before any files have been parsed, whereas auto executes each notebook during the parsing phase.
Also auto will re-execute on any changes to the document (not just code changes)

Personally I was never really on-board with having two methods. But I guess it was done since jupyter-cache was/is less developed. This PR though converges them a bit more, and maybe eventually we can just have the one execution method.

@chrisjsewell
Copy link
Member Author

Note I'm now going to set execution_in_temp to False by default, as I think thats more intuitive.

@chrisjsewell
Copy link
Member Author

Ok, I've finished updating the documentation.
I think I'm happy with the basic functionality so am going to merge.
Then I have moved @mmcky's issue here to #237 and we can discuss/iterate further there, about some of the points you have made here

@chrisjsewell chrisjsewell merged commit 2bc0c11 into master Aug 20, 2020
@chrisjsewell chrisjsewell deleted the execution branch August 20, 2020 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Output execution statistics as a file for website support
3 participants