Skip to content

ratishsp/AveragedPerceptron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

AveragedPerceptron

AveragedPerceptron implementation It is implementation of Averaged Perceptron algorithm in paper "Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms" by Michael Collins (2002) http://acl.ldc.upenn.edu/W/W02/W02-1001.pdf

For a simple tagger implementation with features such as suffix of lengths 1,2,3, current word, and previous tag, use the tagger file PerceptronBigramNoWordVecNoPOS.java
You need to configure the training data file path TRAINING_FILE, test file path DEV_FILE, test output file DEV_OUT_FILE, model output path MODEL_GEN_FILE (perceptron model), MODEL_GEN_AVG_OPTIM (average perceptron model)
Format of the input:
It follows CONLL 2003 format for NER
Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second the named entity tag

eg: training data format
ताजा JJ
साँसें N
और CC
चमचमाते JJ
दाँत N
आपके PR
व्यक्तित्व N
को PSP
निखारते V
हैं V
। RD

दाँतों N
से PSP
आपका PR
आत्मविश्वास N
भी RP
बढ़ता V
है V
। RD

test data format:
साल
और
बाँस
के
इस
जंगल
के
बीच
जंगली
घास
के
कई
मैदान
हैं

जो
इस
अदम् य
दृश्य
को
एक
परिचित
भाव
देते
हैं

The NE classes need to be configured in the java file itself. Presently the following NE have been hardcoded:
ARTIFACT, COUNT, DATE, DAY, DISEASE, DISTANCE, ENTERTAINMENT, FACILITIES, LIVTHINGS, LOCATION, LOCOMOTIVE, MATERIALS, MONEY, MONTH, O, ORGANIZATION, PERIOD, PERSON, PLANTS, QUANTITY, SDAY, TIME, YEAR

About

AveragedPerceptron implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages