-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip kernel specs from example notebooks #1617
Comments
As described in a comment in #1537, I found a quick solution for now: find . -name '*.ipynb' -exec /tmp/holoviews/examples/strip_kernel.sh {} \; Where strip_kernel.sh is: #!/bin/bash
jq --indent 1 \
'
(.cells[] | select(has("outputs")) | .outputs) = []
| (.cells[] | select(has("execution_count")) | .execution_count) = null
| .metadata = {"language_info": {"name":"python", "pygments_lexer": "ipython3"}}
| .cells[].metadata = {}
' $1 > /tmp/stripped.ipynb I'll reassigning to the next milestone as we want a more robust thing in future but this will do for 1.8. Edit: I found out about the appropriate jq command here |
One minor change to strip_kernel.sh is to replace the last line with |
Also, the issue/script name's a little misleading, because as far as I can see (unless I've misread it), the script is removing more than just the kernel spec - it removes lots of other annoying stuff too :) I think this script would be very useful on any project with jupyter notebooks, e.g. asking people to run notebooks through it before submitting PRs. Obviously it's not the job of holoviews developers to solve this problem everywhere, but is there a way we could easily share it across at least our own projects? One more thing is that this solution has dependencies other than python (jq, bash, find), whereas there are various solutions out there using python only. Is there a reason you didn't use one of those python solutions? The linked article argues for jq because speed is required, since the author's adding stripping to git filter/smudge (good luck with that...risky and unreliable in my experience!). However, you're using it as a standalone script, so I don't see what would be wrong with python. (Even if you have many notebooks to strip, a python version would only need to start up python once and then strip in bulk, so shouldn't be much slower...) |
True. clean_notebook.sh would be a better name.
Put a Python version of this into param.ipython?
Yes. Laziness! I found this jq command online and just used it. There is no reason it can't be done with just Python with a little more effort. If we agree to put it into param.ipython, then I'm happy to write the Python equivalent (with the expected notebook related dependencies). |
Came across this: https://github.com/michaelaye/nbstripout |
Much as I hate to add non-param stuff (or indeed any stuff...) to param, junk in notebooks is such an annoying issue on every project that I would do it! It's a shame there isn't a
Thanks, yes, nbstripout (https://github.com/kynan/nbstripout, from https://gist.github.com/minrk/6176788) is one of the "various python solutions out there" I was vaguely thinking about. |
That would be very useful which is why I don't hold out much hope of it ever happening. Even if nbconvert did accept a PR to support this (which seems unlikely to be quick given how much code seems to be in nbstripout), it might take a while before it would be released and available to us. I'm open to just using |
Problem with |
I know what you mean if you're talking about for integration with git via smudge/clean filters, but I'm not proposing anything to integrate with git - just something people can run to clean up before submitting a notebook (change), so I don't think it needs to be fast. Even for multiple notebooks, it wouldn't be too bad because python would only need to start up once. (Unless somehow nbstripout is much much slower even not counting the time it takes to start python etc.)
Jean-Luc linked to it in his original script, which is based on that blog post ;) |
I'm ok with hiding this functionality in param/ipython.py, not because it belongs there but because I don't think I've ever even loaded that file into an editor, so I'm clearly not going to notice there being cruft there. I do think we need to have a suitable command that we can customize to fight new bogus metadata as it crops up in the wild, so I think we need to put the specification for it somewhere we can all agree to use in our projects. I'd like to have a simple way to invoke it, i.e. something like |
I agree - I'd like something really easy. My only concern with that is how annoying it would be to remove precious output from a notebook in place, by accident. Maybe it would need at least an Also, maybe it wouldn't be too bad to tell people to install something via pip (if python only) or conda (if not just python), so long as what came did the things we wanted by default without commandline options? Since people will already have installed jupyter notebook, they will likely have pip or conda. (E.g. it could be nbstripout, which can be pip installed - but unfortunately nbstripout doesn't remove all the metadata, I think.) But if it has to go into param, that's ok with me. And meanwhile, I've filed jupyter/nbconvert#637. |
You can also set a git filter, this one works for me:
and add a
|
PR #2507 referenced above will strip the metadata from examples for 1.10. You can see the Given all the discussion above, it seems that there is no standard way to generate 'clean' notebooks and as a consequence, supposedly 'cleared' notebooks will contain inconsistent cruft (not written by the notebook author) unless something is done to clear it. In this case, this involves an ad hoc |
Addressed in the PR merged above. There is still some general discussion going on but as this particular issue is assigned to 1.10 and the metadata has now been cleared, I will close this issue. I don't mind if we continue the general discussion here, or if we file a new issue either in this repo or some other (more appropriate) repo. |
IMO we should have a release checklist in a wiki or something with this being one of the items. Then we can try our best to check in only cleared notebooks, but at release time we can strip any that made it through somehow. |
In case it helps, the doit/pyct library of tasks will support reporting and clearing extraneous metadata. But right now there's just an issue about choosing the underlying tool to use (https://github.com/pyviz/pyct/issues/3). Further into the future is the idea of wrapping some of the tasks as web services that interact with github (https://github.com/pyviz/pyct/issues/5).
One aim of the doit/pyct project is to reduce manual release checklist content wherever possible (or otherwise, at least to share it across projects we maintain), but if you were to write a release checklist for holoviews it would help inform what tasks doit should supply :) It's also an aim of that project that the tasks just use "standard" underlying tools, so tasks can be run independently by those without doit/pyct (e.g. the test related tasks just use standard tools such as pytest or nose). (Note: the documentation on the pyct page is highly draft and not yet ready to read, but will be readable soon...) |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I would like to throw together a quick script to do this before we release 1.8.
The text was updated successfully, but these errors were encountered: