Skip to content

Latest commit

 

History

History
113 lines (88 loc) · 5.68 KB

README.md

File metadata and controls

113 lines (88 loc) · 5.68 KB

Knesset data pipelines

Build Status Docker build

Knesset data scrapers and data sync

Uses the datapackage pipelines framework to scrape Knesset data and aggregate to different data stores (PostgreSQL, Elasticsearch, Files)

Available Endpoints

Contributing

Looking to contribute? check out the Help Wanted Issues or the Noob Friendly Issues for some ideas.

Running the full pipelines environment using docker

A note for windows users:

Using windows with docker is not currently recomended or supported. The build process seems to fail on numerous issues. If you wish to use windows, do so at your own risk, and please update this README file with instructions if you succeed.

Instructions for running on Ubuntu (other distros and mac should follow a similar process):

This will provide:

  • Pipelines dashboard: http://localhost:5000/
  • PostgreSQL server: postgresql://postgres:123456@localhost:15432/postgres
  • Data files under: .data-docker/

After every change in the code you should run sudo bin/build.sh && sudo bin/start.sh

Using Adminer to view the data

Adminer is a simple Web UI which allows to make queries against the DB.

To start the adminer service as part of the local docker compose environment:

  • (If you haven't do so already) Copy docker-compose.override.example.yml to docker-compose.override.yml
  • Edit docker-compose.override.yml
    • Uncomment the adminer section
  • Start the services
    • bin/start.sh
  • adminer is available at: http://localhost:18080
    • Database Type = PostgreSQL
    • Host = db
    • Port = 5431
    • Database = postgres
    • User = postgres
    • Password = 123456

Installing the project locally and running tests

You should have an activated python 3.6 virtualenv, following procedure will work on Ubuntu 17.04:

curl -kL https://raw.github.com/saghul/pythonz/master/pythonz-install | bash
echo '[[ -s $HOME/.pythonz/etc/bashrc ]] && source $HOME/.pythonz/etc/bashrc' >> ~/.bashrc
source ~/.bashrc
sudo apt-get install build-essential zlib1g-dev libbz2-dev libssl-dev libreadline-dev libncurses5-dev libsqlite3-dev libgdbm-dev libdb-dev libexpat-dev libpcap-dev liblzma-dev libpcre3-dev
pythonz install 3.6.2
sudo pip install virtualenvwrapper
echo 'export WORKON_HOME=$HOME/.virtualenvs; export PROJECT_HOME=$HOME/Devel; source /usr/local/bin/virtualenvwrapper.sh' >> ~/.bashrc
source ~/.bashrc
cd knesset-data-pipelines
mkvirtualenv -a `pwd` -p $HOME/.pythonz/pythons/CPython-3.6.2/bin/python3.6 knesset-data-pipelines

Before running any knesset-data-pipelines script, be sure to activate the virtualenv

You can do that by running workon knesset-data-pipelines

Once you are inside a Python 3.6 virtualenv, you can run the following:

  • bin/install.sh
  • bin/test.sh

You can set some environment variables to modify behaviors, see a refernece at .env.example

Running the dpp cli

  • using docker: bin/dpp.sh
  • locally (from an activated virtualenv): dpp