GitHub - prnicolas/24x7Content: Open source version of Semantic/Taxonomy search project - 24x7 Content

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
java/com/c24x7		java/com/c24x7
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
Git 24x7Content.lnk		Git 24x7Content.lnk
README		README

Repository files navigation

24x7 Content library

This is a draft implementation of a semantic analyzer that extract the topic topograhy from a document or a set of documents.
The implementation runs on JDK 1.6 relies on 
- Information retrieval (modified tf-idf)
- Semantic analysis  (Wikipedia short and long descriptions classification, WordNet hypernyms and categories)
- Machine learning (Conditional Random Fields, Naive Bayes,...)
- Natural Language Processing  (Tagging, chunking,....)

The following open source libraries are to be added to the classpath in order to compile and execute the code base
- Apache Log 4j  1.2.15
- Apache commons-code 1.5
- jUnit 4.0
- Open NLP tools 1.5
- OAuthSignPost 1.2.1
- Apache common-Net 2.2
- MySQL Connector for Java 5.1.1

The application has been successfully tested by extracting topics of similar documents retrieved through search with an accuracy of 82%.

Related patent: "Methods and systems for extracting topics from documents using taxonomy graphs and kirchoff's"
United States 61645413  - May 2012

Patrick Nicolas 
June 2012