Skip to content
Fernando-Melo edited this page Jan 24, 2019 · 23 revisions

This code was developed and tested in the Linux environment (Red Hat Enterprise Linux 5).

Requirements

Maven 3.x - (to compile most of the repositories)

Maven 2.x: (to compile the PwaLucene project)

Step-by-step

Checkout Hadoop (branch-0.14):

  • git clone -b branch-0.14 https://github.com/arquivo/hadoop-common

Install Hadoop: (compile with JAVA 8 and maven 3)

  • cd hadoop-common
  • mvn install
    • This version of Hadoop (http://hadoop.apache.org/) must be used for all mapreduce processing.
      Checkout PwaLucene + PwaArchiveAccess:
  • * `git clone https://github.com/arquivo/pwa-technologies.git`

    Install PwaLucene:
    (compile with JAVA 6 and maven 2)

  • cd pwa-technologies/PwaLucene
  • mvn install Install PwaArchiveAccess:
  • cd pwa-technologies/PwaArchive-access (compile with JAVA 8 and maven 3)
  • mvn install
  • configure (only if you need to change the default configuration)
  • mvn install
  • The JAR and WAR files are available in:

    • pwa-technologies/PwaArchive-access/projects/nutchwax/nutchwax-job/target/nutchwax-job-0.11.0-SNAPSHOT.jar
    • pwa-technologies/PwaArchive-access/projects/nutchwax/nutchwax-webapp/target/nutchwax-webapp-0.11.0-SNAPSHOT.war
    • pwa-technologies/PwaArchive-access/projects/wayback/wayback-webapp/target/wayback-1.2.1.war
    • pwa-technologies/PwaLucene/target/pwalucene-1.0.0-SNAPSHOT.jar

    Others

    Symbolic link of nutch for nutch-trec:

    • cd pwa-technologies/PwaArchive-access
    • ln -s ../../projects/nutchwax/nutchwax-thirdparty/nutch/ projects/nutch-trec/
    This is only necessary if you will use the TREC datasets for tests.