forked from Findwise/Hydra
-
Notifications
You must be signed in to change notification settings - Fork 0
Distributed processing framework for search solutions
License
Shekharv/Hydra
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Hydra Processing Framework Getting started 1. Install MongoDB, and start it 2. Check out and build hydra-core 3. Start the built jar with java -jar hydra-core-jar-with-dependencies.jar or just run the Main class from your IDE You now have a running pipeline system. Try adding some documents using the API classes or the input-stages and play around with it. For a basic pipeline you will need the following jars: hydra-core-jar-with-dependencies - the framework itself basic-stages-jar-with-dependencies - a library of reusable stages solr-out-jar-with-dependencies - a library containing the SolrOutputStage which sends documents to Solr Configuring your pipeline Let's say we want to change the field source in all documents that have been touched by the stage extractPDFMetadata to pdf. It should modify documents in the database intranet. Since our stage makes static changes to fields, we can use the stage SetStaticFieldStage (included in basic-stages) instead of building our own. Since all configuration is done using Json, you'll need a Json representation of your configuration for the stage. It could look like this: { stageClass: "com.findwise.hydra.stage.SetStaticFieldStage", queryOptions: ["touched(extractTitles,true)"], fieldNames: ["source"], fieldValues: ["web"] } The easiest way to do this without coding is to use the CmdlineInserter-class found in the examples library: 1. Add the library with the stage you want to add, in this case the basic-stages jar: Program arguments: -a -p pipeline -l -i myLibaryId basic-stages-jar-with-dependencies.jar You should see the following output: > Added stage library with id: myLibaryId 2. Now, add your stage, by referencing the id to your library of stages, again using CmdlineInserter: Program arguments: -a -p pipeline -s -i myLibraryId -n myStage myStage.properties In this example, myStage.properties contains the configuration of the stage. It is also possible to combine these two operations into one: Program arguments: -a -p pipeline -l -i myLibraryId basic-stages-jar-with-dependencies.jar -s -n myStage myStage.properties For more information on the CmdlineInserter class, try invoking it and it should be fairly self-explanatory. For further information, refer to the Wiki on http://github.com/Findwise/Hydra/wiki
About
Distributed processing framework for search solutions
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published