GitHub - mkaufmann1/JumpQA-1

#JumpQA Automating ground truth generation.

JumpQA is a projects for generating questions about a corpus. More specifically, it takes a directory of TRECs (in .xml form) and outpus a .csv.

##Requirements To install this project, you will need:

###Maven

Download the binary relevant to your machine.
Install maven to the directory of your choice.
Edit your ~/.bash_profile to include the lines

    export M2_HOME=/path/to/maven
    export PATH=$PATH:$M2_HOME/bin

###Java 8

Download and install the binary relevant to your machine.
Edit your ~/.bash_profile to include the line

    export JAVA_HOME=/path/to/java8

##Installation Clone the repository from git. You'll need your IBM ID and password.

git clone https://github.com/cognitive-catalyst/JumpQA

You will be prompted for your IBM ID and password. Once downloaded, run the install script in the main directory.

cd JumpQA/
chmod 755 ./install.sh
./install.sh

##Basic Usage Sample usage of the subprojects within the JumpQA repository. Each subproject contains more thorough documentation and usage instructions.

Before JumpQA can run on a corpus, the corpus must be converted to a .json file. This vastly speeds up how quickly JumpQA can process a corpus. Many of the fields in the TRECs are currently removed. If you wish to keep non-default ones, you will need to edit corpus2json. Eventually, it will be possible to specify which fields to keep in the .properties file.

Once the corpus is in a JSON, JumpQA can process the corpus and generate ground truth.

###Converting Corpus to JSON 0. cd into JumpQa/

Edit corpus2json.properties. The file should look something like:

input=sample/
output=sample/output.json

input is a directory holding a list of .xml trecs. Change it to the directory holding your corpus XMLs. output is the output .json file.

Run java -jar target/corpus2json-0.1.0.jar. If you have a different version, replace 0.1.0 with the one you are using.

###Using JumpQA 0. cd into JumpQa/

CD Edit jumpqa.properties. The file should look something like:

corpus=health-corpus.json
templates=templates.csv
output=health.csv

templates is the templates file to process the TRECs with. trecs is the inputs JSON file. output is the output CSV file. 2. Run java -jar target/jumpqa-0.1.0.jar

##Projects Click on the subproject's name to enter its directory.

###JumpQA This is the main project.

JumpQA takes a corpus in JSON form and creates a set of ground truth. The others are related projects and dependencies.

###Corpus This converts a directory of TRECs to a JSON file that JumpQA can use.

###[Corpus TF-IDF](Corpus TF-IDF/README.md) Calcaulates TFIDF of terms in a corpus for each document.

###NeuralNet Library for neural networks; to be used in JumpQA heuristics. This will eventually be moved to be its own project.

###Random A library which includes an ArrayList which iterates randomly.

##Maintainer Will Beason, wabeason@us.ibm.com

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
BaseProperties		BaseProperties
Corpus TF-IDF		Corpus TF-IDF
Corpus		Corpus
JumpQA		JumpQA
NeuralNet		NeuralNet
ObjectIO		ObjectIO
Random		Random
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaseProperties

BaseProperties

Corpus TF-IDF

Corpus TF-IDF

Corpus

Corpus

JumpQA

JumpQA

NeuralNet

NeuralNet

ObjectIO

ObjectIO

Random

Random

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE.md

LICENSE.md

README.md

README.md

install.sh

install.sh

Repository files navigation

About

Releases

Packages

Languages

License

mkaufmann1/JumpQA-1

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages