Skip to content

atlas-pr/pganonymize

This branch is 208 commits behind rheinwerk-verlag/pganonymize:development.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

de897b8 · Mar 30, 2022
Dec 1, 2021
Mar 30, 2022
Mar 15, 2022
Dec 13, 2021
Jul 8, 2021
Jun 30, 2021
Jan 3, 2020
Jan 2, 2020
Dec 13, 2021
Jan 3, 2020
Mar 15, 2022
Jul 14, 2021
Feb 5, 2020
Apr 16, 2021
Dec 13, 2021
Dec 13, 2021
Dec 13, 2021
Nov 30, 2021
Mar 15, 2022
Dec 13, 2021
Jul 13, 2021
Dec 13, 2019
Dec 13, 2021
Dec 13, 2021

Repository files navigation

pganonymize

A commandline tool to anonymize PostgreSQL databases for DSGVO/GDPR purposes.

It uses a YAML file to define which tables and fields should be anonymized and provides various methods of anonymization. The tool requires a direct PostgreSQL connection to perform the anonymization.

PyPI - Python Version license pypi Download count build pganonymize

  • Intentionally compatible with Python 2.7 (for old, productive platforms)
  • Anonymize PostgreSQL tables on data level entry with various providers (some examples in the table below)
  • Exclude data for anonymization depending on regular expressions or SQL WHERE clauses
  • Truncate entire tables for unwanted data
Field Value Provider Output
first_name John choice (Bob|Larry|Lisa)
title Dr. clear  
street Irving St faker.street_name Miller Station
password dsf82hFxcM mask XXXXXXXXXX
email jane.doe@example.com md5 0cba00ca3da1b283a57287bcceb17e35
email jane.doe@example.com faker.unique.email alex7@sample.com
phone_num 65923473 md5 as_number: True 3948293448
ip 157.50.1.20 set 127.0.0.1
uuid_col 00010203-0405-...... uuid4 f7c1bd87-4d....
  • Note: faker.unique.[provider] only supported on Python 3.6+ (Faker library min. supported python version)
  • Note: uuid4 - only for (native uuid4) columns

See the documentation for a more detailed description of the provided anonymization methods.

The default installation method is to use pip:

$ pip install pganonymize
usage: pganonymize [-h] [-v] [-l] [--schema SCHEMA] [--dbname DBNAME]
               [--user USER] [--password PASSWORD] [--host HOST]
               [--port PORT] [--dry-run] [--dump-file DUMP_FILE]

Anonymize data of a PostgreSQL database

optional arguments:
-h, --help            show this help message and exit
-v, --verbose         Increase verbosity
-l, --list-providers  Show a list of all available providers
--schema SCHEMA       A YAML schema file that contains the anonymization
                        rules
--dbname DBNAME       Name of the database
--user USER           Name of the database user
--password PASSWORD   Password for the database user
--host HOST           Database hostname
--port PORT           Port of the database
--dry-run             Don't commit changes made on the database
--dump-file DUMP_FILE
                        Create a database dump file with the given name
--init-sql INIT_SQL   SQL to run before starting anonymization

Despite the database connection values, you will have to define a YAML schema file, that includes all anonymization rules for that database. Take a look at the schema documentation or the YAML sample schema.

Example calls:

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    -v

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    --init-sql "set search_path to non_public_search_path; set work_mem to '1GB';" \
    -v

With the --dump-file argument it is possible to create a dump file after anonymizing the database. Please note, that the pg_dump command from the postgresql-client-common library is necessary to create the dump file for the database, e.g. under Linux:

$ sudo apt-get install postgresql-client-common

Example call:

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    --dump-file=/tmp/dump.gz \
    -v

If you want to run the anonymizer within a Docker container you first have to build the image:

$ docker build -t pganonymize .

After that you can pass a schema file to the container, using Docker volumes, and call the anonymizer:

$ docker run \
    -v <path to your schema>:/schema.yml \
    -it pganonymize \
    /usr/local/bin/pganonymize \
    --schema=/schema.yml \
    --dbname=<database> \
    --user=<user> \
    --password=<password> \
    --host=<host> \
    -v

About

A commandline tool for anonymizing PostgreSQL databases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.8%
  • Makefile 4.6%
  • Dockerfile 0.6%