Skip to content

midiablo/CoarseWordNet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 

Repository files navigation

Coarsening WordNet

Abstract:

Currently used general purpose dictionaries are often too fine-grained, with narrow sense divisions that are not relevant for many Natural Language applications. WordNet, which is a widely used sense inventory for Word Sense Disambiguation, has the same problem. With different applications requiring different levels of sense granularities, producing sense clustered inventories of arbitrary sense granularity has evolved as a crucial task. We try to exploit the resources available like human-labelled sense clusterings and semi-automatically generated domain labels of synsets, to estimate the similarity between synsets. Using supervision, we learn a model which predicts the probability of any two senses of a word to be merged. To learn a more generic model, we propose a graph based approach, which allows us to use the information learnt from supervision as well. Using this complete similarity measure, we propose a simple method for clustering synsets.

============= Notes:

  1. A demo of the system is available in Demo.java in applet package.

  2. For more details refer: Merging Word Senses, M.Tech. Thesis - Sumit Bhagwani, Computer Science and Engineering, IIT Kanpur.

=============

Installation Instructions:

  1. The system requires all following additional datasets in /home/USERNAME/Data folder:
  1. The USERNAME needs to be set in the code (StaticValues.java) and in properties files of EXTJWNL and BabelNet.

=============

Code Details:

  1. Data Processing: The data preprocessing codes are available in krsystem.ontology.senseClustering package
  2. Supervised Learning Framework:
    • The SVM framework used for learning similarity metric is available in krsystem.ontology.senseClustering.svm package.
    • The FeatureGenerator.java class collects the features and passes it to learning module.
    • Evaluation.java serves as the main class for the package in which we learn and evaluate our models.
  3. Semi-supervised Learning Framework:
  • The SimRank framework is available in simRank package
  • The Sigmoid Transformation of SVM scores is available in svmPredictionNormalization package
  • connectedComponentAnalysis package uses the similarity metric learnt and performs clustering using Connected Components in the graph.
  • The connectedComponentAnalysis package also includes the code to evaluate the performance of clustering obtained on standard datasets

About

Coarsening WordNet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Roff 65.6%
  • PostScript 12.4%
  • HTML 5.9%
  • Shell 5.8%
  • C 5.3%
  • Java 3.5%
  • Other 1.5%