forked from Findwise/Hydra
Distributed processing framework for search solutions
License
cosh/Hydra
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Hydra Processing Framework Getting started 1. Install MongoDB, and start it 2. Check out and build hydra-core 3. Start the built jar with java -jar hydra-core-jar-with-dependencies.jar or just run the Main class from your IDE You now have a running pipeline system. Try adding some documents using the API classes or the input-stages and play around with it. For a basic pipeline you will need the following jars: hydra-core-jar-with-dependencies - the framework itself basic-stages-jar-with-dependencies - a library of reusable stages solr-out-jar-with-dependencies - a library containing the SolrOutputStage which sends documents to Solr Configuring your pipeline Let's say we want to change the field source in all documents that have been touched by the stage extractPDFMetadata to pdf. It should modify documents in the database intranet. Since our stage makes static changes to fields, we can use the stage SetStaticFieldStage (included in basic-stages) instead of building our own. Since all configuration is done using Json, you'll need a Json representation of your configuration for the stage. It could look like this: { stageClass: "com.findwise.hydra.stage.SetStaticFieldStage", queryOptions: ["touched(extractTitles,true)"], fieldNames: ["source"], fieldValues: ["web"] } The easiest way to do this without coding is to use the CmdlineInserter-class found in the examples library: 1. Add the library with the stage you want to add, in this case the basic-stages jar: Program arguments: add library pipeline basic-stages-jar-with-dependencies.jar You should see the following output: > Attempting to work with pipeline 'pipeline' > Added stage library with id: 4fa39d8dda06916a91fb9665 2. Now, add your stage, by referencing the id to your library of stages, again using CmdlineInserter: Program arguments: add stage pipeline webSourceStage 4fa39d8dda06916a91fb9665 {stageClass: 'com.findwise.hydra.stage.SetStaticFieldStage', queryOptions:['touched(extractTitles,true)'], fieldNames: ['source'],fieldValues: ['web']} The format is: add stage <pipeline> <name-of-stage> <library-id> <your-json-configuration>. Alternatively, you can keep the Json in a separate file called <stagename>.properties, which will be read if you fail to provide any json on the command line. For further information, refer to the Wiki on http://github.com/Findwise/Hydra/wiki
About
Distributed processing framework for search solutions
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published