Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

kb-dk/newspaper-hadoop-jpylyzer

Repository files navigation

This repository have been archived and exists for historical purposes. No updates or futher development will go into this repository. The content can be used as is but no support will be given.


Readme

This is an autonomous component that starts hadoop jobs for all batches that have been ingested to the bitrepository. The hadoop job runs jpylyzer.py on the files, and saves the result in the corresponding object in DOMS.

It expects to find a config file of the name config.properties in the conf folder of the install dir

The config file must contain these values

#Doms doms.username= doms.password= doms.url=http://:7880/fedora

#Batch iterator iterator.useFileSystem=false

#Autonomous component framework autonomous.lockserver.url= autonomous.sboi.url= autonomous.pastSuccessfulEvents=Data_Archived autonomous.futureEvents=JPylyzed autonomous.maxThreads=1 autonomous.maxRuntimeForWorkers=360000000

#hadoop job.folder= file.storage.path= hadoop.user=newspapr ninestars.jpylyzer.executable=/usr/lib/python2.7/site-packages/jpylyzer/jpylyzer.py hadoop.files.per.map.tasks=5

Futhermore, the conf folder must contain the two files core-site.xml yarn-site.xml These should be identical to the ones deployed on the hadoop cluster.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages