Skip to content

The system can be divided into two parts: preprocessing and graph matching. The preprocessing part parses plaintext documents and outputs dependency graphs in json format. The graph matching takes these dependency graphs and applies graph edit distance to measure similarity.

Notifications You must be signed in to change notification settings

haakondr/NLP-Graphs

Repository files navigation

A thesis project focusing on the usage of dependency graphs as a representation of natural language text. 
Sentences are represented as graph objects, tagged with part-of-speech tags and relations between tokens.
This representation is used as a measure of similarity between two sentences, utilized for plagiarism detection.

The interesting part of the program is mainly GraphEditDistance.java, which is the focus of this thesis.
--------------------------------

Dependencies: java7, maven
a  MongoDB database must be running at the location specified in app.properties  (for a full run, not for calculating graph edit distance between two sentences with GED.java)

Usage:

modify app.properties and select the appropriate folders for the data set.

mvn compile
mvn exec:java

About

The system can be divided into two parts: preprocessing and graph matching. The preprocessing part parses plaintext documents and outputs dependency graphs in json format. The graph matching takes these dependency graphs and applies graph edit distance to measure similarity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages