-
Notifications
You must be signed in to change notification settings - Fork 3
The system can be divided into two parts: preprocessing and graph matching. The preprocessing part parses plaintext documents and outputs dependency graphs in json format. The graph matching takes these dependency graphs and applies graph edit distance to measure similarity.
haakondr/NLP-Graphs
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A thesis project focusing on the usage of dependency graphs as a representation of natural language text. Sentences are represented as graph objects, tagged with part-of-speech tags and relations between tokens. This representation is used as a measure of similarity between two sentences, utilized for plagiarism detection. The interesting part of the program is mainly GraphEditDistance.java, which is the focus of this thesis. -------------------------------- Dependencies: java7, maven a MongoDB database must be running at the location specified in app.properties (for a full run, not for calculating graph edit distance between two sentences with GED.java) Usage: modify app.properties and select the appropriate folders for the data set. mvn compile mvn exec:java
About
The system can be divided into two parts: preprocessing and graph matching. The preprocessing part parses plaintext documents and outputs dependency graphs in json format. The graph matching takes these dependency graphs and applies graph edit distance to measure similarity.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published