Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a branch with notebook outputs stripped #71

Closed
psychemedia opened this issue Jan 9, 2018 · 6 comments
Closed

Creating a branch with notebook outputs stripped #71

psychemedia opened this issue Jan 9, 2018 · 6 comments

Comments

@psychemedia
Copy link

psychemedia commented Jan 9, 2018

Is there a way of using nbstripout that would allow me to create a branch of cleaned notebooks from a branch that contains notebooks with populated output cells (eg ones with output cells populated that can be used for testing with nbval).

I'm thinking of a private github repo workflow where there is a testing-master branch containing executed notebooks with populated test output cells that begets a release branch containing notebooks that can be zipped and distributed to students.

Presumably, a variant of nbstripout could also be used to add a git filter that would automatically run a notebook when commiting it to a repository to ensure that all its output cells are populated?

@kynan
Copy link
Owner

kynan commented Jul 8, 2018

By its nature, a git filter can only act on the commit about to be made. It's not designed to "spawn" another commit in a different branch (I suppose it could be made to do that with some trickery, but I rather wouldn't like to implement such a "hack" in nbstripout).

You can however emulate the behaviour you want by installing nbstripout only in a .gitattributes file in your release branch. Assuming your "dirty" notebooks are on master, you can create a clean branch as follows:

Create an orphan release branch and install nbstripout in that branch only:

git checkout --orphan release
nbstripout --install --attributes=.gitattributes
git add .gitattributes
git commit -m 'Install nbstripout'

You can then cherry-pick notebook commits as follows (you will probably have to do this in order to avoid merge conflicts, unless each commit is entirely self contained):

git cherry-pick --no-commit
git commit -a --no-edit

From my quick test those 2 stages are necessary for the filter to kick in i.e. if you just do a plain cherry pick the filter is not applied.

Hope this helps!

@kynan
Copy link
Owner

kynan commented Jul 8, 2018

Presumably, a variant of nbstripout could also be used to add a git filter that would automatically run a notebook when commiting it to a repository to ensure that all its output cells are populated?

Not easily: as mentioned, npstripout uses a git clean/smudge filter and operates purely on the file level. No cells are ever executed.

You would need to look at a pre-commit hook, however I expect that's not too easy to set up: you'd need to start a notebook server, run the notebook and deal with failures. This would also take very long.

If you only want to verify the output is populated, that's easier to do (and you could potentially reuse some of nbstripout's code for that).

@psychemedia
Copy link
Author

@kynan Thanks for that - will give it a try. git is still a bit voodoo to me; I need to clear some time and try to get a proper understanding of how it works and also clarify in my own mind exactly what sort of process I want to implement.

For generating newly run notebooks, could that be done elsewhere in a Github managed repository, eg using CI hooks to run something to create the new notebooks? (Apols - this is going off-topic for nbstripout, I'm thinking aloud through my fingers...)

@kynan
Copy link
Owner

kynan commented Jul 15, 2018

There's another option I didn't think of earlier: you can use the git filter-branch approach described in the README.

By creating "new" notebooks, do you mean creating stripped versions from "full" versions? Or the other way round?

You presumably could use CI hooks to automate either variant, but I don't have anything to suggest since I haven't tried anything of that sort.

If you haven't come across https://mybinder.org before, I wonder if that could be a starting point.

@kynan
Copy link
Owner

kynan commented Jun 28, 2020

@psychemedia have you found a suitable workflow for your needs?

@psychemedia
Copy link
Author

psychemedia commented Jun 29, 2020

@kynan I've actually moved to a workflow around jupytext now that uses a text based representation for notebooks (no cell outputs).

Reflecting back, I think that a git filter-branch --tree-filter approach would probably work okay for a release: create new branch, run the git filter to clean all the notebooks in it, commit.

Here's another example of that approach: rewriting the contents of a branch as text files using jupytext; in a branch, run:

git filter-branch --tree-filter 'jupytext --to md */*.ipynb && rm -f */*.ipynb' HEAD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants