nlp-finalProject

##progress

data collection(Twitter API)
pretreatment of the data deconstructed by using hashTag,NER or POS tag.
Remove and filter URL(use regular expression) ,abbreviation(stemming?) , prep....
if there are no training data? How to mining the sentiment data and process?
small amount of data? How is the accuracy? How about the accuracy with bigger data?
feature unigram,bigram,trigram to n-gram accuracy performance.
each word's statical counts.
improve algorithm, analysis using SVM ,CRF, classifier, naive bayes, our-own-algorithm

Test how data size , remove and filter word, any kind of application have effects on accuracy.
UI?

11/15

##Work_Division ###11/11

###11/15

###11/17

naive bayes + entropy ->(丁錦)
前處理 and feature(stoping word, stemming) + SVM ->(文彬+振安)
data mining(twitter... or other)並做資料結構(分類:hashtag,文章內容,文章極性) + 定義一句話的accurate answer(先用表情判斷一文章的極性) + 研究lexicon＆polarity 做極性判斷(正中負分細改進？分數判斷極性傾向？分數判斷依據？)

Feature improvement
Pre-processing improvement
add more stop words
Strange word changing like transforming “happyyyy” to “happy”
Both sentiment words Hashmap and the emoji in Tweet are used to define the tweet
Naïve Bayes algorithm improvement
Laplacian Smoothing
Max Entropy improvement
Define tweet that have several sentence with different emoji

Split an tweet into several sentences and calculate the sentences with emoji(the paper drops tweets with contradict emoji)
if positive sentences’ number >negative sentences’ number the tweet is positive
the tweet is negative (the result may influence the weight of some tweets in corpus)

A sentence with a lot of emoji, define the sentence based on number of emoji, for example, a sentence with x “:)” and y “:(”, if x>y, the sentence is positive
Define Subject related tweets and compare the result with non-subject tweet’s result

##problem

##reference && papers

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.settings		.settings
API		API
models		models
resource		resource
src		src
target/classes		target/classes
twitterAPI		twitterAPI
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
chromedriver		chromedriver
chromedriver_PC.exe		chromedriver_PC.exe
pom.xml		pom.xml
tweet_test.txt		tweet_test.txt
tweet_test_with_emoji.txt		tweet_test_with_emoji.txt
tweetforMaxent.txt		tweetforMaxent.txt
tweetforMaxentModel.txt		tweetforMaxentModel.txt
twitter4j.properties		twitter4j.properties