Use Spring Batch to implement count word in PDF files follow Master/Slave model
Create schema for storing word-count and word-count-reposiory. Execute create_schema.sql
Note: If an error "language 'plpgsql' does not exist" happened when executing create_schema.sql
- Access to postgres directory by command
su -postgres
- Use this command
createlang -d dbname plpgsql
- Run
create_schema.sql
again
Download PDF files from here
Open application.properties
from master and config.
- Config values for the datasource:
batch.jdbc.driver
: Postgresql driver for datasourcebatch.jdbc.url
: URL of word_count data schemabatch.jdbc.url_repo
: URL of job repository schemabatch.jdbc.username
&batch.jdbc.password
: User name & password to access datasource
- Config values for database
batch.schema.script
: Create schemabatch.drop.script
: Drop schemabatch.job_repo.databaseType
: Specify database type
- Config ActiveMQ
broker.url
: ActiveMQ URL
Open application.properties
from slave and config.
- Config pdf diretory path:
processed.pdf.dir
: pdf directory path
- Config delimeter
delimiter
: Cut the word when matching delimeter
Note: Some properties in slave and master are same so don't need repeat
To build, execute from the top level directory:
$ mvn clean install
-
Start ActiveMQ
-
Start the slave:
$ java -Djava.awt.headless=true -jar word-count-partitioning-slave/target/remote-partitioning-slave-0.1.jar
- Start the master:
$ java -jar word-count-partitioning-master/target/remote-partitioning-master-0.1.jar