Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make git pre-commit hook to cleanup notebook metadata #194

Open
dniku opened this issue Apr 24, 2019 · 4 comments
Open

Make git pre-commit hook to cleanup notebook metadata #194

dniku opened this issue Apr 24, 2019 · 4 comments
Assignees

Comments

@dniku
Copy link
Collaborator

dniku commented Apr 24, 2019

Currently there is a lot of garbage in metadata. Some notebooks refer to nonexistent kernels (like rl), others were created with Python 2 and throw an error message if you try to open them on a machine without a py2 kernel.

jupyter/nbconvert#637 provides a command that strips all redundant metadata from notebooks. It should probably run as git pre-commit hook.

@dniku
Copy link
Collaborator Author

dniku commented Aug 21, 2019

Here is a fairly dirty script that cleans up metadata in one notebook (depends on jq):

#!/usr/bin/env python3

import argparse
import sys
import subprocess
from pathlib import Path

parser = argparse.ArgumentParser()
parser.add_argument('path', type=Path)
parser.add_argument('--clear-outputs', action='store_true')
args = parser.parse_args()

jq_cmd = [
    '(.cells[] | select(has("execution_count")) | .execution_count) = null',
    '.metadata = {"language_info": {"name":"python", "pygments_lexer": "ipython3"}}',
    '.cells[].metadata = {}',
]

if args.clear_outputs:
    jq_cmd.append(
        '(.cells[] | select(has("outputs")) | .outputs) = []'
    )

cmd = [
    'jq',
    '--indent', '1',
    ' | '.join(jq_cmd),
    str(args.path),
]

formatted = subprocess.check_output(cmd, encoding='utf8')

with args.path.open('w') as fp:
    fp.write(formatted)

@dniku
Copy link
Collaborator Author

dniku commented Aug 29, 2019

@dniku
Copy link
Collaborator Author

dniku commented Aug 29, 2019

@dniku
Copy link
Collaborator Author

dniku commented Sep 5, 2019

tl;dr:

pip install --user pre-commit

Put the following into .pre-commit-config.yaml (draft version):

# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.0.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
    -   id: check-added-large-files
    -   id: check-vcs-permalinks
    -   id: check-builtin-literals
    -   id: check-case-conflict
    -   id: check-docstring-first
    -   id: check-executables-have-shebangs
    -   id: check-merge-conflict
-   repo: https://github.com/jumanjihouse/pre-commit-hooks
    rev: 1.11.0
    hooks:
    -   id: shellcheck
-   repo: https://github.com/psf/black
    rev: 19.3b0
    hooks:
    -   id: black

Run:

pre-commit install

Now each time you commit something a bunch of checks will run on your changes.

I will use this for a while to see whether this is convenient and once I feel comfortable about it I will make a PR.

Also, I will have to manually integrate my pretty-ipynb script and perhaps also the above projects for running black on Jupyter notebooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant