That project is a Work In Progress
At the moment only Twitter is supported, with a single tag : #Spark
The idea of this project is to provide a tool that will aggregate recently published articles (tutorials, technical news, updates, case studies) and other source of learning (videos, webinars...) about one or several provided subjects (Scala, Spark...). Each of the resources would be retrieved from various sources (Twitter, Linkedin, Blogs... ). For each a score will be calculated. That score will represent the potential interest or value based on their popularity and other factors (to be determined)
This system has 5 main components :
- Aggregator : aggregates tweets, posts and all other items streamed from the possible article sources (Twitter, Instagram...). Usage and details...
- Score updater (Twitter specific) : loads periodically (every 12h) all pending tweets and update their score based on their current popularity. Usage and details...
- Processor : process all articles as soon available, then apply some analytics, and finally persist each into the main DB. Usage and details...
- Data server : simple app serving/manipulating the data stored in the main DB. Usage and details...
- Webapp : used to browse the articles stored in the main DB, and to manipulate there current state. Usage and details...
One of the options to run this system is Via Docker containers. Here are a couple of directions to help you install it by that mean.
The provided docker-compose.yml file will start the aggregator, kafka and zookeeper all at once in separate containers.
$ docker-compose up -d
To see the logs
$ docker-compose logs -f
Stop all at once
$ docker-compose stop -t 60
This project is design to work with Kafka 10 (v0.10.1.0). You can either use your own installation or docker images such as jplock/zookeeper for Zookeeper and ches/kafka
$ docker run -d -p 2181:2181 --name zookeeper jplock/zookeeper:3.4.6
$ docker run -d -p 9092:9092 --name kafka --link zookeeper:zookeeper ches/kafka:0.10.1.0
To build the docker image you first need to fill the environment variable file docker-env.list, then run the following commands from the root
// Build a Fatjar in ./docker
$ sbt "project aggregator" coverageOff assembly
// Build the image
$ docker build -t [dockerhub-username]/[project] ./aggregator/docker/.
// Build a Fatjar in ./docker
$ sbt "project twitterScoreUpdater" coverageOff assembly
// Build the image
$ docker build -t [dockerhub-username]/[project] ./twitter-score-updater/docker/.
To run the tests and generate the coverage reports (./[PROJECT]/target/scala-2.11/scoverage-report/index.html)
Aggregator
$ sbt test-agg
Twitter score updater
$ sbt test-score
Processor
$ sbt test-proc
Shared entities
$ sbt test-shared
All at once
$ sbt test-all
Allows to visualize the pending/accepted/rejected articles. The app is composed of the modules data-server and webapp .
Note : DATA_SERVER_HOST
Used as a data server to access the Database in a REST fashion. Uses Hapi and Sequelize
The configuration for the app host/port and database access is in config/default.json , you can override it by adding a local.json in the same directory
$ cd data-server
$ npm install -g nodemon
$ npm install
$ npm start
If all goes well you should see this output, indicating that the server started and Sequelize managed to connect :
Executing (default): SELECT 1+1 AS result
Every change will hot reload all resources automatically
Simple single page app with a server delivering static content. It only contains the UI components, and relies on the Data Server for the content. Based on Vuejs and Webpack.
$ cd webapp
$ npm install
$ npm run dev
You should see an output similar to :
Listening at http://localhost:8080
Every change will hot reload all resources automatically