-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ NEW: nb_merge_streams
configuration
#364
Conversation
If `True`, ensure all stdout / stderr output streams are merged into single outputs. This ensures deterministic outputs.
Codecov Report
@@ Coverage Diff @@
## master #364 +/- ##
==========================================
- Coverage 87.43% 87.34% -0.10%
==========================================
Files 12 12
Lines 1337 1367 +30
==========================================
+ Hits 1169 1194 +25
- Misses 168 173 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@@ -112,4 +112,7 @@ Then for parsing and output rendering: | |||
* - `nb_output_stderr` | |||
- `show` | |||
- One of 'show', 'remove', 'warn', 'error' or 'severe', [see here](use/format/stderr) for details. | |||
* - `nb_merge_streams` | |||
- `False` | |||
- If `True`, ensure all stdout / stderr output streams are merged into single outputs. This ensures deterministic outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is perhaps worth to clarify further. Which aspect of the outputs becomes deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to make a PR with the change, and I'll merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, perhaps 5 minutes left to review of a new config option is too short. Maybe something to consider for the future.
Also I can't make this PR exactly because I don't understand what result does merging of the streams achieve. Given that I'm somewhat familiar with nbformat and the involved stack, this is possibly true for many others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is "documented" by the test and it output (see PR files changed).
It is a common pattern, already used in places like: https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py, and was already planned in jupyter-book/jupyter-book#973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I understand the purpose of the feature now. I'll trust your judgement call that looking at the source and the test is good enough as far as documentation is concerned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, I'm certainly open to improved documentation, and we will probably iterate on that more in jupyter-book/jupyter-book#1448 (comment)
The results, e.g. here https://sqla-tutorials-nb.readthedocs.io/en/latest/metadata.html#emitting-ddl-to-the-database, speak for themselves though: before all the output lines were in separate div
boxes 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is "documented" by the test and it output (see PR files changed).
It is a common pattern, already used in places like: https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py, and was already planned in executablebooks/jupyter-book#973
Thanks @chrisjsewell -- this is a great improvement 👍
Thanks @akhmerov for your comments -- I am also not sure what This ensures deterministic outputs.
means here.
Just my 2cents
-- I find documentation in tests a sub-optimal form of documentation as it requires reading tests, which typically means you need to understand the test infrastructure with is non-trivial in many cases and (for me) is cognitive overhead :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its deterministic because, instead of having a random (non-deterministic) number of stdout/stderr outputs, you have only a maximum of one for each. As explained in jupyter-book/jupyter-book#973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be documented more clearly for end-users - if we think that this is a useful feature, and if it is turned off by default, then we should have a section that is easily discoverable for a user that would benefit from this feature.
For example, we could create a parent category of ## `stdout`/`stderr`
around here, then create a nested section like:
### Group `stdout` and `stderr` outputs into a single stream
Notebooks may print multiple things to `stdout` and `stderr`. For example, if a cell prints status updates throughout its execution, each of these is often printed to `stdout`. By default, these outputs may be split across multiple items, and will be rendered as separate "chunks" in your built documentation.
If you'd like each of the outputs in `stderr` and `stdout` to be merged into a single stream for each, use the following configuration:
```
nb_merge_streams = True
```
This will ensure that all `stderr` and `stdout` outputs are merged into a single group.
This also makes the cell outputs more deterministic because slight differences in timing may result in different orders of `stderr` and `stdout` in the cell output, while this will sort them properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I'll just open a PR to propose these changes since this one is already merged.
"print('stderr3', file=sys.stderr)\n", | ||
"1" | ||
], | ||
"outputs": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here all the stdout/stderr are all mixed up
print('stderr3', file=sys.stderr) | ||
1 | ||
<CellOutputNode classes="cell_output"> | ||
<literal_block classes="output stream" language="myst-ansi" linenos="False" xml:space="preserve"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here they are all in single blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah - thanks @chrisjsewell that's helpful. So the order will always be the same (so long as the code-cell
contents doesn't change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the order of stdout / stderr, individually, are guaranteed, so they can be group merged. the order of stdout vs stderr is not guaranteed (they are independent streams), so stderr is placed always after stdout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice improvement 👍 . A few specific suggestions for how we can document this better for users.
Also, I agree with others that we should wait for a review before merging this kind of PR, there's no reason to self-merge quickly unless we are fixing a problematic bug or test, or if the PR change is trivial. Getting more 👀 on the PR is always a good idea.
and new_outputs[i + 1]["name"] == "stdout" | ||
): | ||
stdout = new_outputs.pop(i + 1) | ||
new_outputs.insert(i, stdout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I understand correctly, is this correct:
new_outputs
may contain a bunch of outputs, some stdout
and some stderr
, all interwoven. This section sorts them so that the stdout
outputs always come after the stderr
outputs?
and if that is correct, maybe I'm missing something but could we simplify with something like:
sorted(new_outputs, lambda output: output.get("name"))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdout
come before stderr
. Again, I would emphasise this is copied directly from https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py. If you want to change this, I would suggest making a PR there first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@choldgraf I think it's not what you describe (although I might have misread it).
- non-stream outputs aren't reshuffled. So
[stdout, image, stdout]
stays untouched. - multiple arguments of the same stream type are merged and not just reordered.
Again, I would emphasise this is copied directly from https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py. If you want to change this, I would suggest making a PR there first
Looking at the description of nbval, I think its purpose is different from publishing notebooks, and therefore the logic it applies might not be optimal for myst-nb or vice versa.
Speaking of publication purposes, reordering streams may be undesirable: imagine demonstrating code that produces a warning halfway through.
On the other hand, coalescing same stream types may be a good idea without even giving the users an option to configure this. Or is there a use case for having two independent text outputs in a row after a single input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagine demonstrating code that produces a warning halfway through.
except, if you write to sderr
, then there is no guarantee that it will end up halfway stdout
anyhow, I have no capacity left to devote to this, so I'll leave you guys to propose new PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @akhmerov - I feel like we should see how folks use this feature and see if there are any natural points to improve the ux over time 👍
@@ -112,4 +112,7 @@ Then for parsing and output rendering: | |||
* - `nb_output_stderr` | |||
- `show` | |||
- One of 'show', 'remove', 'warn', 'error' or 'severe', [see here](use/format/stderr) for details. | |||
* - `nb_merge_streams` | |||
- `False` | |||
- If `True`, ensure all stdout / stderr output streams are merged into single outputs. This ensures deterministic outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be documented more clearly for end-users - if we think that this is a useful feature, and if it is turned off by default, then we should have a section that is easily discoverable for a user that would benefit from this feature.
For example, we could create a parent category of ## `stdout`/`stderr`
around here, then create a nested section like:
### Group `stdout` and `stderr` outputs into a single stream
Notebooks may print multiple things to `stdout` and `stderr`. For example, if a cell prints status updates throughout its execution, each of these is often printed to `stdout`. By default, these outputs may be split across multiple items, and will be rendered as separate "chunks" in your built documentation.
If you'd like each of the outputs in `stderr` and `stdout` to be merged into a single stream for each, use the following configuration:
```
nb_merge_streams = True
```
This will ensure that all `stderr` and `stdout` outputs are merged into a single group.
This also makes the cell outputs more deterministic because slight differences in timing may result in different orders of `stderr` and `stdout` in the cell output, while this will sort them properly.
If
True
, ensure all stdout / stderr output streams are merged into single outputs.This ensures deterministic outputs.