Skip to content

prnicolas/24x7Content

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

24x7 Content library

This is a draft implementation of a semantic analyzer that extract the topic topograhy from a document or a set of documents.
The implementation runs on JDK 1.6 relies on 
- Information retrieval (modified tf-idf)
- Semantic analysis  (Wikipedia short and long descriptions classification, WordNet hypernyms and categories)
- Machine learning (Conditional Random Fields, Naive Bayes,...)
- Natural Language Processing  (Tagging, chunking,....)

The following open source libraries are to be added to the classpath in order to compile and execute the code base
- Apache Log 4j  1.2.15
- Apache commons-code 1.5
- jUnit 4.0
- Open NLP tools 1.5
- OAuthSignPost 1.2.1
- Apache common-Net 2.2
- MySQL Connector for Java 5.1.1

The application has been successfully tested by extracting topics of similar documents retrieved through search with an accuracy of 82%.

Related patent: "Methods and systems for extracting topics from documents using taxonomy graphs and kirchoff's"
United States 61645413  - May 2012

Patrick Nicolas 
June 2012

About

Open source version of Semantic/Taxonomy search project - 24x7 Content - 2011-2012

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published