Skip to content
Jonathan Cox edited this page Mar 5, 2017 · 11 revisions

Welcome to the ht-archive wiki!

Getting Started

Some prerequisites:

  • Working PostgreSQL installation

You can install PostgreSQL manually or using Docker. If you decide to use Docker (install by selecting your OS on this page, https://www.docker.com/products/overview), start PostgreSQL with the following command on the commandline after replacing /some/folder with a folder on your computer that is easily accessible:

sudo docker run -d -e POSTGRES_PASSWORD=1234 -e POSTGRES_USER=dbadmin -e POSTGRES_DB=sandbox -v /some/folder:/var/lib/postgresql/data -p 5432:5432 --name postgres postgres

Setting up the database

  1. Download the PostgreSQL backup from one of the following places:

  2. Extract the SQL file from the downloaded crawler_er.tar.gz, using the following command on the commandline or an archive tool:

    tar xzf /the_folder_with_backup/crawler_er.tar.gz
    
  3. Load the SQL into PostgreSQL

    a. If you are using a local installation of PostgreSQL, log into your postgres server as root and create a new superuser named dbadmin with login permissions. Make sure to keep track of the password you create for the user.

    CREATE ROLE dbadmin WITH SUPERUSER LOGIN PASSWORD <your password here>;
    

    Create a database called crawler, exit psql and run the following command. Enter your password when prompted and the query will load the data into the crawler database. Be patient - the query can take 15 minutes or so to run.

    $ psql --host=localhost --dbname=crawler --username=dbadmin -f <path/to/.sql/file>
    

    b. If you are using a Docker installation of PostgreSQL do:

    sudo docker exec postgres psql --username dbadmin -c "CREATE DATABASE crawler" sandbox
    sudo docker exec postgres psql --username dbadmin -f /var/lib/postgresql/data/crawler.sql crawler
    
  4. Start the web application a. If you're already using Docker and have not made changes to the code, start the application with

    sudo docker run -d -p 8080:8080 --link postgres:postgres --name ht-archive bmenn/ht-archive --db crawler --usr dbadmin --pwd 1234 --host postgres
    

    b. If you have made some changes and want to see the updated application run

    node app.js --db crawler --usr dbadmin --pwd 1234 --host REPLACE_ME_WITH_DOCKER_OR_POSTGRES_IP
    
Clone this wiki locally