- Seyyed Mohammad Sadegh Keshavarzi
- Seyyed Alireza Hosseini
- Ali Aliabadi
- Ali Shirmohammadi
- commons (common modules)
- crawler
- es_page_processor (process pages for elasticSearch)
- page_processor (process pages for hbase)
- search api
- Spark - Used to run mapReduce
- Kafka - A distrbuted queue that contains 3 main topic (links, page for hbase, page for elasticsearch)
- ElasticSearch - Used to store data and run search queries
- Redis - Used to check politness for domains and check to reduce updating pages for page_processors
- HBase - Used to store data about links of a page and anchor
- DropWizard - Used to monitoring java programs
- JSoup - Used to parse the pages
- Jackson - Used to serialize and deserializing page class
- Maven - Dependency Management
- Zookeeper - Used for managing hbase and kafka
- Hadoop - Used for using proper file system
Check wikis for installation of technologies.