Welcome to the ht-archive wiki!

Getting Started

Some prerequisites:

  • Working PostgreSQL installation

You can install PostgreSQL manually or using Docker. If you decide to use Docker (install by selecting your OS on this page,, start PostgreSQL with the following command on the commandline after replacing /some/folder with a folder on your computer that is easily accessible:

sudo docker run -d -e POSTGRES_PASSWORD=1234 -e POSTGRES_USER=dbadmin -e POSTGRES_DB=sandbox -v /some/folder:/var/lib/postgresql/data -p 5432:5432 --name postgres postgres

Setting up the database

  1. Download the PostgreSQL backup from one of the following places:

  2. Extract the SQL file from the downloaded crawler_er.tar.gz, using the following command on the commandline or an archive tool:

    tar xzf /the_folder_with_backup/crawler_er.tar.gz
  3. Load the SQL into PostgreSQL

    a. If you are using a local installation of PostgreSQL, log into your postgres server as root and create a new superuser named dbadmin with login permissions. Make sure to keep track of the password you create for the user.

    CREATE ROLE dbadmin WITH SUPERUSER LOGIN PASSWORD <your password here>;

    Create a database called crawler, exit psql and run the following command. Enter your password when prompted and the query will load the data into the crawler database. Be patient - the query can take 15 minutes or so to run.

    $ psql --host=localhost --dbname=crawler --username=dbadmin -f <path/to/.sql/file>

    b. If you are using a Docker installation of PostgreSQL do:

    sudo docker exec postgres psql --username dbadmin -c "CREATE DATABASE crawler" sandbox
    sudo docker exec postgres psql --username dbadmin -f /var/lib/postgresql/data/crawler.sql crawler
  4. Start the web application a. If you're already using Docker and have not made changes to the code, start the application with

    sudo docker run -d -p 8080:8080 --link postgres:postgres --name ht-archive bmenn/ht-archive --db crawler --usr dbadmin --pwd 1234 --host postgres

    b. If you have made some changes and want to see the updated application run

    node app.js --db crawler --usr dbadmin --pwd 1234 --host REPLACE_ME_WITH_DOCKER_OR_POSTGRES_IP
