Skip to content

cosh/Hydra

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hydra Processing Framework

Getting started
1. Install MongoDB, and start it
2. Check out and build hydra-core
3. Start the built jar with 
	java -jar hydra-core-jar-with-dependencies.jar
	or just run the Main class from your IDE
	
You now have a running pipeline system. Try adding some documents using the API classes or the input-stages and play around with it.

For a basic pipeline you will need the following jars:
	hydra-core-jar-with-dependencies - the framework itself
	basic-stages-jar-with-dependencies - a library of reusable stages
	solr-out-jar-with-dependencies - a library containing the SolrOutputStage which sends documents to Solr

Configuring your pipeline								
Let's say we want to change the field source in all documents that have been touched by the stage extractPDFMetadata to pdf. It should modify documents in the database intranet. Since our stage makes static changes to fields, we can use the stage SetStaticFieldStage (included in basic-stages) instead of building our own. Since all configuration is done using Json, you'll need a Json representation of your configuration for the stage. It could look like this:
	
{
	stageClass: "com.findwise.hydra.stage.SetStaticFieldStage",
	queryOptions: ["touched(extractTitles,true)"],
	fieldNames: ["source"],
	fieldValues: ["web"]
}
	
The easiest way to do this without coding is to use the CmdlineInserter-class found in the examples library:
	
1. Add the library with the stage you want to add, in this case the basic-stages jar:
	Program arguments: add library pipeline basic-stages-jar-with-dependencies.jar
	You should see the following output:
	> Attempting to work with pipeline 'pipeline'
	> Added stage library with id: 4fa39d8dda06916a91fb9665
2. Now, add your stage, by referencing the id to your library of stages, again using CmdlineInserter:
	Program arguments: add stage pipeline webSourceStage 4fa39d8dda06916a91fb9665 {stageClass: 'com.findwise.hydra.stage.SetStaticFieldStage', queryOptions:['touched(extractTitles,true)'], fieldNames: ['source'],fieldValues: ['web']}

The format is: add stage <pipeline> <name-of-stage> <library-id> <your-json-configuration>. Alternatively, you can keep the Json in a separate file called <stagename>.properties, which will be read if you fail to provide any json on the command line.

For further information, refer to the Wiki on http://github.com/Findwise/Hydra/wiki

About

Distributed processing framework for search solutions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published