Implementation of different sentence completion architectures using confabulation , module hierarchies and multiconfabulation using a symbol-level model of modules and knowledge links.
This implementation is initially the product of a master thesis by Bernard Paulus and Cédric Snauwaert, relased under the GPL - see LICENSE.txt .
Each of the directories here correspond to an eclipse project. See Setup, build and run with Eclipse for an overview on how tho run them.
Here are the programs that might be interesting:
.
|-- java_corpus_preprocessor
| `-- src
| `-- Main.java
`-- java_sentence_completion
`-- src
|-- colt
| `-- SparseMatricesBenchmarks.java
`-- confabulation
|-- Main.java
`-- tests
`-- BatchCompletionTest.java
In java_corpus_preprocessor, Main.java pre-processes a corpus UTF8 text file into a form suitable for the sentence completion program. It opens a GUI to request the location of the file to pre-process.
In java_sentence_completion,
- Main.java is the main sentence completion program. It is REPL that completes the sentences that are inputted in the command line.
- SparseMatricesBenchmarks.java is the file where we carried out our test to check whether it was appropriate to walk away of parallelcolt. All that files in that packages are the only ones left that need parallelcolt.
- BatchCompletionTest.java is the program that runs multiple completions at once. It was the one used to generate the example in the chapter 5 of our master thesis.
Here are the instructions to set up and run the sentence completion project with Eclipse
You need
- At least 1.5 GB of RAM for the intermediary sized corpus (mille et une nuits). Architectures with less RAM are can still run, but for smaller corpus's.
- An installation of Eclipse IDE at least Indigo, with JUnit4, downloadable as a single program here (download the "classic" version)
- Some preprocessed corpus files. We uploaded some here.
- Launch eclipse and start a new project
- Create the project and give java_sentence_completion as the project
location
- Click next and open the libraries tab
- Add java_sentence_completion/src/parallelcolt-0.9.4.jar to the set of
external libraries. This is required to compile and run the matrix
benchmarks.
- Add JUnit4 to the project libraries
- Click finish.
Setup: done!
This assumes you have your corpus preprocessed / unzipped from the above archive.
-
Open the Main.java file, and click on the build and run button
The program will open a dialog to choose the preprocessed corpus file.
-
Select your preprocessed corpus file. Beware: if you plan to use an intermediary-sized corpus, like the full text of "les contes des mille et une nuits", apply first the next step first.
-
If you run the project with a corpus that necessitates too much memory, it crashes and prints the following message
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ...
To solve this problem, we will rise the limit on memory usage of the Java virtual machine.
-
You are done!