Skip to content

Latest commit



136 lines (130 loc) · 5.62 KB

File metadata and controls

136 lines (130 loc) · 5.62 KB

GitHub backup script

Backs up GitHub data for multiple users by using GitHub's migration API. Probably won't work to back up organisations.


This script requires Python 3.

If you want to use config in yaml, you will need the PyYAML library. You can install it like this:

pip install PyYAML

If you are using the PyYAML library and want to use the !ENV constructor with a .env file you will need to install the python-dotenv module:

pip install python-dotenv

To run the script run:



python /path/to/config.yml


Config can be in either a YAML or a json format. Use the CONFIG variable near the top of the script to embed config into the script. If you do so, other sources of config will be ignored.


  version: 1
      format: '%(levelname)s:%(message)s'
      format: '%(levelname)s:%(asctime)s:%(name)s:%(threadName)s:%(message)s'
      class: logging.StreamHandler
      level: INFO
      formatter: brief
      stream: ext://sys.stdout
      class: logging.FileHandler
      level: DEBUG
      formatter: precise
      filename: backup.log
      encoding: utf-8
      level: DEBUG
      handlers: [console, file]
  outfile: "./backups/{username}/{datetime:%d%m%y_%H%M%S}.tar.gz"
  - token: !ENV GH_TOKEN
    check_time: 10
  - token: ghp_*************************************
    outfile: "/other/location/{username}.tar.gz"
    exclude_git_data: True


  "logging": {
    "version": 1,
    "formatters": {
      "brief": {
        "format": "%(levelname)s:%(message)s"
      "precise": {
        "format": "%(levelname)s:%(asctime)s:%(name)s:%(threadName)s:%(message)s"
    "handlers": {
      "console": {
        "class": "logging.StreamHandler",
        "level": "INFO",
        "formatter": "brief",
        "stream": "ext://sys.stdout"
      "file": {
        "class": "logging.FileHandler",
        "level": "DEBUG",
        "formatter": "precise",
        "filename": "backup.log"
    "loggers": {
      "github_backup": {
        "level": "DEBUG",
        "handlers": [
  "global": {
    "outfile": "./backups/{username}/{datetime:%Y-%m-%dT%H%M%S}.tar.gz"
  "users": [
      "token": "ghp_*************************************"
      "token": "ghp_*************************************",
      "outfile": "/other/location/{username}.tar.gz",
      "exclude_git_data": true

The global section will apply to every user, but you can override the values for individual users. You can use the !ENV constructor in the YAML config to reference environment variables.


The config for each user is in a list under the users key. For each user, the following options can be used:

  • token - Your GitHub personal access token. Cannot go in global. Required
  • outfile - The file name to save the backup as, including the .tar.gz. Can use formatting syntax including: {username}, {datetime} (of backup finish; see format codes) and {id} (of backup). If it is set to None, then the script will not download the backup. It will still check its state.
  • check_time - How long to wait inbetween checking the state of the backup in seconds. If it is set to None, the script won't query the state of the backup and so won't automatically download it. Default: 30
  • delete - Whether to delete the archive from GitHub's storage after downloading. Default: True
  • affiliation - Comma separated values. Options: owner, collaborator and organization_member. Default: owner
  • visibility - Limits repositories to back up to ones with a certain visibility. Options: all, public, private. Default: all
  • exclude_repos - A string or list of strings that are the full names (username/repo_name) of repositories that you don't want to back up.
  • lock_repositories - Lock the repositories being backed up. Default: False
  • exclude_metadata - Whether metadata, such as __, should be excluded. Default: False
  • exclude_git_data - Whether the repositories' git data should be excluded. Default: False
  • exclude_attachments - Do not include attachments. Default: False
  • exclude_releases - Do not include releases. Default: False
  • exclude_owner_projects - Whether projects owned by the user should be excluded. Default: False
  • org_metadata_only - Whether the backup should only include metadata (will ignore the exclude flags.). Default: False
  • forks - Whether to include forks or not. Options: include, exclude, only. Default: exclude
  • disabled_repos - Whether to include disabled repos. Options: include, exclude, only. Default: include


This script uses Python's logging module to log output. You can add logging config under the logging key using the dictConfig() schema. The variable, LOGGER_NAME, at the top of the script will be used as the name of the logger (default 'github_backup'), so make sure to have the same name in the loggers part of your config.