Strip kernel specs from example notebooks #1617

jlstevens · 2017-06-27T01:23:17Z

I would like to throw together a quick script to do this before we release 1.8.

jlstevens · 2017-06-27T12:07:41Z

As described in a comment in #1537, I found a quick solution for now:

find . -name '*.ipynb' -exec /tmp/holoviews/examples/strip_kernel.sh {} \;

Where strip_kernel.sh is:

#!/bin/bash
jq --indent 1 \
    '
    (.cells[] | select(has("outputs")) | .outputs) = []
    | (.cells[] | select(has("execution_count")) | .execution_count) = null
    | .metadata = {"language_info": {"name":"python", "pygments_lexer": "ipython3"}}
    | .cells[].metadata = {}
    ' $1 > /tmp/stripped.ipynb

I'll reassigning to the next milestone as we want a more robust thing in future but this will do for 1.8.

Edit: I found out about the appropriate jq command here

ea42gh · 2017-07-29T19:55:13Z

One minor change to strip_kernel.sh is to replace the last line with
' $1 > /tmp/$(basename $1)

ceball · 2017-07-30T10:06:03Z

Also, the issue/script name's a little misleading, because as far as I can see (unless I've misread it), the script is removing more than just the kernel spec - it removes lots of other annoying stuff too :)

I think this script would be very useful on any project with jupyter notebooks, e.g. asking people to run notebooks through it before submitting PRs. Obviously it's not the job of holoviews developers to solve this problem everywhere, but is there a way we could easily share it across at least our own projects?

One more thing is that this solution has dependencies other than python (jq, bash, find), whereas there are various solutions out there using python only. Is there a reason you didn't use one of those python solutions? The linked article argues for jq because speed is required, since the author's adding stripping to git filter/smudge (good luck with that...risky and unreliable in my experience!). However, you're using it as a standalone script, so I don't see what would be wrong with python. (Even if you have many notebooks to strip, a python version would only need to start up python once and then strip in bulk, so shouldn't be much slower...)

jlstevens · 2017-07-30T23:48:04Z

the script is removing more than just the kernel spec

True. clean_notebook.sh would be a better name.

is there a way we could easily share it across at least our own projects?

Put a Python version of this into param.ipython?

Is there a reason you didn't use one of those python solutions?

Yes. Laziness! I found this jq command online and just used it. There is no reason it can't be done with just Python with a little more effort.

If we agree to put it into param.ipython, then I'm happy to write the Python equivalent (with the expected notebook related dependencies).

ea42gh · 2017-07-31T00:23:40Z

Came across this: https://github.com/michaelaye/nbstripout
@michaelaye does appear on holoviews people list,
maybe asking some questions might be good?

ceball · 2017-07-31T08:46:41Z

is there a way we could easily share it across at least our own projects?

Put a Python version of this into param.ipython?

[...]

If we agree to put it into param.ipython, then I'm happy to write the Python equivalent (with the expected notebook related dependencies).

Much as I hate to add non-param stuff (or indeed any stuff...) to param, junk in notebooks is such an annoying issue on every project that I would do it!

It's a shame there isn't a --clean option for jupyter nbconvert --to notebook, or even individual --strip-xyz options. I wonder if a contribution like that would be accepted by nbconvert? I don't have a feel for it.

Came across this: https://github.com/michaelaye/nbstripout

Thanks, yes, nbstripout (https://github.com/kynan/nbstripout, from https://gist.github.com/minrk/6176788) is one of the "various python solutions out there" I was vaguely thinking about.

jlstevens · 2017-07-31T11:57:03Z

It's a shame there isn't a --clean option for jupyter nbconvert --to notebook, or even individual --strip-xyz options.

That would be very useful which is why I don't hold out much hope of it ever happening. Even if nbconvert did accept a PR to support this (which seems unlikely to be quick given how much code seems to be in nbstripout), it might take a while before it would be released and available to us.

I'm open to just using nbstripout although I'm not sure whether it cleans out ipywidget metadata just yet. Partial stripping isn't very useful as you would still need to edit notebooks manually to clear out the remaining gunk.

michaelaye · 2017-07-31T16:33:31Z

Problem with nbstripout is that it's slow. Have you guys seen this nice blog post?
http://timstaley.co.uk/posts/making-git-and-jupyter-notebooks-play-nice/

ceball · 2017-07-31T16:43:31Z

Problem with nbstripout is that it's slow.

I know what you mean if you're talking about for integration with git via smudge/clean filters, but I'm not proposing anything to integrate with git - just something people can run to clean up before submitting a notebook (change), so I don't think it needs to be fast. Even for multiple notebooks, it wouldn't be too bad because python would only need to start up once. (Unless somehow nbstripout is much much slower even not counting the time it takes to start python etc.)

Have you guys seen this nice blog post?

Jean-Luc linked to it in his original script, which is based on that blog post ;)

jbednar · 2017-07-31T23:09:46Z

I'm ok with hiding this functionality in param/ipython.py, not because it belongs there but because I don't think I've ever even loaded that file into an editor, so I'm clearly not going to notice there being cruft there. I do think we need to have a suitable command that we can customize to fight new bogus metadata as it crops up in the wild, so I think we need to put the specification for it somewhere we can all agree to use in our projects.

I'd like to have a simple way to invoke it, i.e. something like nbclean *.ipynb.

ceball · 2017-08-02T10:02:37Z

something like nbclean *.ipynb

I agree - I'd like something really easy. My only concern with that is how annoying it would be to remove precious output from a notebook in place, by accident. Maybe it would need at least an --in-place option so the default is to create a new file, or something? I'm not sure.

Also, maybe it wouldn't be too bad to tell people to install something via pip (if python only) or conda (if not just python), so long as what came did the things we wanted by default without commandline options? Since people will already have installed jupyter notebook, they will likely have pip or conda. (E.g. it could be nbstripout, which can be pip installed - but unfortunately nbstripout doesn't remove all the metadata, I think.)

But if it has to go into param, that's ok with me.

And meanwhile, I've filed jupyter/nbconvert#637.

basnijholt · 2018-01-02T16:01:09Z

You can also set a git filter, this one works for me:

git config filter.nbclearoutput.clean "jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --ClearOutputPreprocessor.remove_metadata_fields='[\"deletable\", \"editable\", \"collapsed\", \"scrolled\"]' --stdin --stdout"

and add a .gitattributes:

*.ipynb filter=nbclearoutput

jlstevens · 2018-04-03T10:46:25Z

PR #2507 referenced above will strip the metadata from examples for 1.10. You can see the jq command I am now using in that PR.

Given all the discussion above, it seems that there is no standard way to generate 'clean' notebooks and as a consequence, supposedly 'cleared' notebooks will contain inconsistent cruft (not written by the notebook author) unless something is done to clear it. In this case, this involves an ad hoc jq command...

jlstevens · 2018-04-03T16:47:26Z

Addressed in the PR merged above.

There is still some general discussion going on but as this particular issue is assigned to 1.10 and the metadata has now been cleared, I will close this issue.

I don't mind if we continue the general discussion here, or if we file a new issue either in this repo or some other (more appropriate) repo.

philippjfr · 2018-04-03T16:50:11Z

IMO we should have a release checklist in a wiki or something with this being one of the items. Then we can try our best to check in only cleared notebooks, but at release time we can strip any that made it through somehow.

ceball · 2018-04-06T06:47:07Z

In case it helps, the doit/pyct library of tasks will support reporting and clearing extraneous metadata. But right now there's just an issue about choosing the underlying tool to use (https://github.com/pyviz/pyct/issues/3).

Further into the future is the idea of wrapping some of the tasks as web services that interact with github (https://github.com/pyviz/pyct/issues/5).

we should have a release checklist in a wiki or something

One aim of the doit/pyct project is to reduce manual release checklist content wherever possible (or otherwise, at least to share it across projects we maintain), but if you were to write a release checklist for holoviews it would help inform what tasks doit should supply :)

It's also an aim of that project that the tasks just use "standard" underlying tools, so tasks can be run independently by those without doit/pyct (e.g. the test related tasks just use standard tools such as pytest or nose).

(Note: the documentation on the pyct page is highly draft and not yet ready to read, but will be readable soon...)

github-actions · 2024-10-24T16:30:00Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

jlstevens added this to the v1.8 milestone Jun 27, 2017

jlstevens self-assigned this Jun 27, 2017

jlstevens modified the milestones: v1.9, v1.8, v1.8.1 Jun 27, 2017

jlstevens added the type: docs Related to the documentation and examples label Jul 7, 2017

jlstevens changed the title ~~Strip kernel specs from example notebook~~ Strip kernel specs from example notebooks Jul 7, 2017

jlstevens modified the milestones: v1.8.1, 1.8.2 Jul 7, 2017

jbednar mentioned this issue Jul 29, 2017

Docs and minor code fixes #1765

Merged

philippjfr modified the milestones: 1.8.4, v1.9 Sep 14, 2017

philippjfr modified the milestones: v1.9, v1.10 Oct 27, 2017

jlstevens mentioned this issue Apr 3, 2018

Stripped metadata from bokeh element notebooks #2507

Merged

jlstevens closed this as completed Apr 3, 2018

basnijholt mentioned this issue Apr 3, 2018

Add a gitfilter to clean the output and metadata from an ipynb #2514

Closed

philippjfr mentioned this issue Aug 4, 2018

Update Customizing Plots Docs with more description about .options #2923

Merged

ceball mentioned this issue Nov 17, 2019

Support for dealing with notebooks in git holoviz-dev/pyctdev#42

Open

github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strip kernel specs from example notebooks #1617

Strip kernel specs from example notebooks #1617

jlstevens commented Jun 27, 2017

jlstevens commented Jun 27, 2017 •

edited

Loading

ea42gh commented Jul 29, 2017

ceball commented Jul 30, 2017

jlstevens commented Jul 30, 2017 •

edited

Loading

ea42gh commented Jul 31, 2017

ceball commented Jul 31, 2017

jlstevens commented Jul 31, 2017

michaelaye commented Jul 31, 2017

ceball commented Jul 31, 2017

jbednar commented Jul 31, 2017 •

edited

Loading

ceball commented Aug 2, 2017

basnijholt commented Jan 2, 2018 •

edited

Loading

jlstevens commented Apr 3, 2018

jlstevens commented Apr 3, 2018

philippjfr commented Apr 3, 2018

ceball commented Apr 6, 2018

github-actions bot commented Oct 24, 2024

Strip kernel specs from example notebooks #1617

Strip kernel specs from example notebooks #1617

Comments

jlstevens commented Jun 27, 2017

jlstevens commented Jun 27, 2017 • edited Loading

ea42gh commented Jul 29, 2017

ceball commented Jul 30, 2017

jlstevens commented Jul 30, 2017 • edited Loading

ea42gh commented Jul 31, 2017

ceball commented Jul 31, 2017

jlstevens commented Jul 31, 2017

michaelaye commented Jul 31, 2017

ceball commented Jul 31, 2017

jbednar commented Jul 31, 2017 • edited Loading

ceball commented Aug 2, 2017

basnijholt commented Jan 2, 2018 • edited Loading

jlstevens commented Apr 3, 2018

jlstevens commented Apr 3, 2018

philippjfr commented Apr 3, 2018

ceball commented Apr 6, 2018

github-actions bot commented Oct 24, 2024

jlstevens commented Jun 27, 2017 •

edited

Loading

jlstevens commented Jul 30, 2017 •

edited

Loading

jbednar commented Jul 31, 2017 •

edited

Loading

basnijholt commented Jan 2, 2018 •

edited

Loading