Skip to content

nimbo3/Jimbo

Repository files navigation

Joojle search engine Build Status codecov

Jimbo
Collaborators:

  • Seyyed Mohammad Sadegh Keshavarzi
  • Seyyed Alireza Hosseini
  • Ali Aliabadi
  • Ali Shirmohammadi

Project modules:

  • commons (common modules)
  • crawler
  • es_page_processor (process pages for elasticSearch)
  • page_processor (process pages for hbase)
  • search api

Build with :

  • Spark - Used to run mapReduce
  • Kafka - A distrbuted queue that contains 3 main topic (links, page for hbase, page for elasticsearch)
  • ElasticSearch - Used to store data and run search queries
  • Redis - Used to check politness for domains and check to reduce updating pages for page_processors
  • HBase - Used to store data about links of a page and anchor
  • DropWizard - Used to monitoring java programs
  • JSoup - Used to parse the pages
  • Jackson - Used to serialize and deserializing page class
  • Maven - Dependency Management
  • Zookeeper - Used for managing hbase and kafka
  • Hadoop - Used for using proper file system

Check wikis for installation of technologies.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published