Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ NEW: nb_merge_streams configuration #364

Merged
merged 1 commit into from
Oct 3, 2021
Merged

Conversation

chrisjsewell
Copy link
Member

If True, ensure all stdout / stderr output streams are merged into single outputs.
This ensures deterministic outputs.

If `True`, ensure all stdout / stderr output streams are merged into single outputs.
This ensures deterministic outputs.
@codecov
Copy link

codecov bot commented Oct 3, 2021

Codecov Report

Merging #364 (a097602) into master (c54becd) will decrease coverage by 0.09%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #364      +/-   ##
==========================================
- Coverage   87.43%   87.34%   -0.10%     
==========================================
  Files          12       12              
  Lines        1337     1367      +30     
==========================================
+ Hits         1169     1194      +25     
- Misses        168      173       +5     
Flag Coverage Δ
pytests 87.34% <83.33%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
myst_nb/render_outputs.py 86.74% <82.75%> (-0.50%) ⬇️
myst_nb/__init__.py 87.09% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c54becd...a097602. Read the comment docs.

@chrisjsewell chrisjsewell merged commit 66d26b2 into master Oct 3, 2021
@chrisjsewell chrisjsewell deleted the merge-streams branch October 3, 2021 23:12
@@ -112,4 +112,7 @@ Then for parsing and output rendering:
* - `nb_output_stderr`
- `show`
- One of 'show', 'remove', 'warn', 'error' or 'severe', [see here](use/format/stderr) for details.
* - `nb_merge_streams`
- `False`
- If `True`, ensure all stdout / stderr output streams are merged into single outputs. This ensures deterministic outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is perhaps worth to clarify further. Which aspect of the outputs becomes deterministic?

Copy link
Member Author

@chrisjsewell chrisjsewell Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to make a PR with the change, and I'll merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, perhaps 5 minutes left to review of a new config option is too short. Maybe something to consider for the future.

Also I can't make this PR exactly because I don't understand what result does merging of the streams achieve. Given that I'm somewhat familiar with nbformat and the involved stack, this is possibly true for many others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is "documented" by the test and it output (see PR files changed).
It is a common pattern, already used in places like: https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py, and was already planned in jupyter-book/jupyter-book#973

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I understand the purpose of the feature now. I'll trust your judgement call that looking at the source and the test is good enough as far as documentation is concerned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I'm certainly open to improved documentation, and we will probably iterate on that more in jupyter-book/jupyter-book#1448 (comment)

The results, e.g. here https://sqla-tutorials-nb.readthedocs.io/en/latest/metadata.html#emitting-ddl-to-the-database, speak for themselves though: before all the output lines were in separate div boxes 😄

Copy link
Member

@mmcky mmcky Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is "documented" by the test and it output (see PR files changed).
It is a common pattern, already used in places like: https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py, and was already planned in executablebooks/jupyter-book#973

Thanks @chrisjsewell -- this is a great improvement 👍

Thanks @akhmerov for your comments -- I am also not sure what This ensures deterministic outputs. means here.

Just my 2cents-- I find documentation in tests a sub-optimal form of documentation as it requires reading tests, which typically means you need to understand the test infrastructure with is non-trivial in many cases and (for me) is cognitive overhead :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its deterministic because, instead of having a random (non-deterministic) number of stdout/stderr outputs, you have only a maximum of one for each. As explained in jupyter-book/jupyter-book#973

Copy link
Member

@choldgraf choldgraf Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be documented more clearly for end-users - if we think that this is a useful feature, and if it is turned off by default, then we should have a section that is easily discoverable for a user that would benefit from this feature.

For example, we could create a parent category of ## `stdout`/`stderr` around here, then create a nested section like:

### Group `stdout` and `stderr` outputs into a single stream

Notebooks may print multiple things to `stdout` and `stderr`. For example, if a cell prints status updates throughout its execution, each of these is often printed to `stdout`. By default, these outputs may be split across multiple items, and will be rendered as separate "chunks" in your built documentation.

If you'd like each of the outputs in `stderr` and `stdout` to be merged into a single stream for each, use the following configuration:

```
nb_merge_streams = True
```

This will ensure that all `stderr` and `stdout` outputs are merged into a single group.
This also makes the cell outputs more deterministic because slight differences in timing may result in different orders of `stderr` and `stdout` in the cell output, while this will sort them properly.

Copy link
Member

@choldgraf choldgraf Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I'll just open a PR to propose these changes since this one is already merged.

"print('stderr3', file=sys.stderr)\n",
"1"
],
"outputs": [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here all the stdout/stderr are all mixed up

print('stderr3', file=sys.stderr)
1
<CellOutputNode classes="cell_output">
<literal_block classes="output stream" language="myst-ansi" linenos="False" xml:space="preserve">
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here they are all in single blocks

Copy link
Member

@mmcky mmcky Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - thanks @chrisjsewell that's helpful. So the order will always be the same (so long as the code-cell contents doesn't change).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the order of stdout / stderr, individually, are guaranteed, so they can be group merged. the order of stdout vs stderr is not guaranteed (they are independent streams), so stderr is placed always after stdout

Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice improvement 👍 . A few specific suggestions for how we can document this better for users.

Also, I agree with others that we should wait for a review before merging this kind of PR, there's no reason to self-merge quickly unless we are fixing a problematic bug or test, or if the PR change is trivial. Getting more 👀 on the PR is always a good idea.

and new_outputs[i + 1]["name"] == "stdout"
):
stdout = new_outputs.pop(i + 1)
new_outputs.insert(i, stdout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I understand correctly, is this correct:

new_outputs may contain a bunch of outputs, some stdout and some stderr, all interwoven. This section sorts them so that the stdout outputs always come after the stderr outputs?

and if that is correct, maybe I'm missing something but could we simplify with something like:

sorted(new_outputs, lambda output: output.get("name"))

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdout come before stderr. Again, I would emphasise this is copied directly from https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py. If you want to change this, I would suggest making a PR there first

Copy link
Contributor

@akhmerov akhmerov Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@choldgraf I think it's not what you describe (although I might have misread it).

  • non-stream outputs aren't reshuffled. So [stdout, image, stdout] stays untouched.
  • multiple arguments of the same stream type are merged and not just reordered.

Again, I would emphasise this is copied directly from https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py. If you want to change this, I would suggest making a PR there first

Looking at the description of nbval, I think its purpose is different from publishing notebooks, and therefore the logic it applies might not be optimal for myst-nb or vice versa.

Speaking of publication purposes, reordering streams may be undesirable: imagine demonstrating code that produces a warning halfway through.

On the other hand, coalescing same stream types may be a good idea without even giving the users an option to configure this. Or is there a use case for having two independent text outputs in a row after a single input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine demonstrating code that produces a warning halfway through.

except, if you write to sderr, then there is no guarantee that it will end up halfway stdout

anyhow, I have no capacity left to devote to this, so I'll leave you guys to propose new PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @akhmerov - I feel like we should see how folks use this feature and see if there are any natural points to improve the ux over time 👍

@@ -112,4 +112,7 @@ Then for parsing and output rendering:
* - `nb_output_stderr`
- `show`
- One of 'show', 'remove', 'warn', 'error' or 'severe', [see here](use/format/stderr) for details.
* - `nb_merge_streams`
- `False`
- If `True`, ensure all stdout / stderr output streams are merged into single outputs. This ensures deterministic outputs.
Copy link
Member

@choldgraf choldgraf Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be documented more clearly for end-users - if we think that this is a useful feature, and if it is turned off by default, then we should have a section that is easily discoverable for a user that would benefit from this feature.

For example, we could create a parent category of ## `stdout`/`stderr` around here, then create a nested section like:

### Group `stdout` and `stderr` outputs into a single stream

Notebooks may print multiple things to `stdout` and `stderr`. For example, if a cell prints status updates throughout its execution, each of these is often printed to `stdout`. By default, these outputs may be split across multiple items, and will be rendered as separate "chunks" in your built documentation.

If you'd like each of the outputs in `stderr` and `stdout` to be merged into a single stream for each, use the following configuration:

```
nb_merge_streams = True
```

This will ensure that all `stderr` and `stdout` outputs are merged into a single group.
This also makes the cell outputs more deterministic because slight differences in timing may result in different orders of `stderr` and `stdout` in the cell output, while this will sort them properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants