Skip to content

jclosure/my-storm-play

Repository files navigation

My Twitter and Storm Playground

This project is a prototyping bed for real-time NLP work against the twitter firehose. It is designed to support Java, Clojure, Python, Ruby, and Javascript as components within Storm topologies. As I build this framework out, please feel free to fork it and adapt it to your needs.

Run from Java

Twitter StreamReaderService

App.main

General purpose TwitterStreaming service. Should be injected into Twitter Spouts to hook into the firehose.

Interesting Topologies

RollingTopTwitterWordsMain.main

Rolling totals of top n words occurring in scoped FilterQuery of twitter stream.

TwitterFunTopology.main

Captures the following:

Rolling filtered tweet stream, e.g. keyword tracking, location, language

Rolling sentiment analysis against language-specific dictionaries

Rolling hashtags ranking

Rolling entity feed ranking

Run from Clojure

CLI

lein run

Or

mvn clojure:run

Run The Java Code From Clojure (nRepl)

See: src/main/clojure/com/joelholder/twitter-fun.clj

(ns com.joelholder.twitter-fun
  (:import [com.joelholder.twitter Constants StreamReaderService]
           [com.joelholder.topology TwitterFunTopology])
  (:gen-class)
  )


(require 'clojure.pprint)
(use 'clojure.pprint)

;; run the StreamReaderservice
(defn run-twitter []
  (let [service (StreamReaderService.)]
    (.readTwitterFeed service)
    (Thread/sleep 60000)
    ))

;; run sentiment and rank analytics toplogy
(defn run-topology []
  (. TwitterFunTopology main (make-array String 0)))

In Emacs, C-c M-n on the above buffer to switch to the namespace. Then you can (run-toplogy), etc..

Filtering Examples

Keywords

FilterQuery qry = new FilterQuery();

//new String[] keywords = {"#wine", "#cabernet", "#chardonnay"};
//new String[] keywords = {"nfl", "football", "cowboys"};
String[] keywords = { "barbie", "MLP", "monster high" };

qry.track(keywords);
stream.filter(qry);

Output

Got tweet:I adore this concept. :D not sure about it being by barbie but what the hey.  https://t.co/nceAEPnYdR
Got tweet:Ex-Div Reminder For Front Street US MLP Income Fund Light (MLP) https://t.co/Wu7wpmEJ4q
Got tweet:1998 Dance 'til Dawn Barbie 2nd in Great Fashions of 20th Century Series NRFB ! https://t.co/JxY1w83BQF https://t.co/g8gPFj6umJ
Got tweet:@thiiiiialves se quiser te dou uma barbie
Got tweet:The Amaze Chase | Life in the Dreamhouse | Barbie - https://t.co/8YrAW3IaR3 #sport #sport_review #sport_training https://t.co/LVuorqQmVl
Got tweet:RT @Deligracy: WHERE ARE THE BUILDS AND BARBIE? - Update: https://t.co/4n8vT5TPiZ via @YouTube
Got tweet:https://t.co/Fi7LpQ9KZh
https://t.co/Fi7LpQ9KZh
https://t.co/Fi7LpQ9KZh
https://t.co/Fi7LpQ9KZh https://t.co/ZVLEXSbemh
Got tweet:RT @_jordanmo: Sympa @France2tv qui fait le jeu du #FN et préfère annuler #DPDA plutôt que de laisser la parole aux autres candidats suite …
Got tweet:RT @VoceNaoSabiaQ: Se a Barbie fosse real, ela seria magra
demais para ter filhos e a desproporcional para andar de cabeça
erguida.

Language

// filter language
String[] english = new String[]{"en"};
String[] spanish = new String[]{"es"};
String[] langs = Stream.of(english, spanish)
    .flatMap(Stream::of)
    .toArray(String[]::new);
FilterQuery langQuery = new FilterQuery();
langQuery.language(langs);
String[] keywords = { "wine", "vino" };
langQuery.track(keywords);
stream.filter(langQuery);

Output

Got tweet:@_SofiaBravo_SNM ya me parecía , seguro te metieron pastillas al vino como a vos tg y quedaste duracel
Got tweet:A.M. SMITH 247 249 HENNEPIN AVE MINN MN 1800's LIQUOR  WINE BOTTLE JUG GALLON https://t.co/QmWM3tyxyF https://t.co/Eph2ed1FLk
Got tweet:RT @Playing_Dad: [At Last Supper]
*Jesus raises bread*
This is my body
*raises wine*
& my blood
*pulls out 8 of Clubs*
& this is your card
…
Got tweet:Never been happier than when my parents turned up to my flat with 4 bottles of wine, 7L of Irn Bru and two loaves of Parkin.
Got tweet:https://t.co/3BWwtVHy8J is for sale via @DomainsMachine buy it now!! 
#wine #tobacco #alcohol #dutyfree #shopping https://t.co/ZyCd7Y8wk1
Got tweet:RT @Country_Words: She’s a bubble bath and candles, baby come and kiss me, she’s a one glass of wine and she’s feelin’ kinda tipsy. -Brad P…
Got tweet:Mulled wine on a Monday afternoon #bestflat #anniemac https://t.co/6x3XuyK9S8
Got tweet:#ibmdidyouknow there is a great wine app called wine4me. Getting it now! #ibminsight https://t.co/YkGmC48sEk
Got tweet:RT @Luuccho14: Que hermoso esto de que no vino ningún profesor
Got tweet:Whaaaaaat ?  #Watson knows wine now?  I knew I liked #Watson #ibminsight #race2insight
Got tweet:It's our birthday week!  The Wine and Cheese chat is sold out, but we still have seats for Thursday's... https://t.co/JV8VmCxchx
Got tweet:Amy Gross from VineSleuth is speaking my language.  Buying wine at a store - not optimal. #IBMInsight https://t.co/41tYxyScDx
Got tweet:RT @IBMServiceMgmt: Looking for the perfect bottle of wine? @vinesleuth provides expertise via mobile app. @AmyCGross #ibminsight https://t…
Got tweet:62 million wine drinkers in US. But how easy is it for you to pick your bottle!  Wine4me success story from @amycgross #insighteconomy
Got tweet:RT @ibminsight: Calling all wine enthusiasts and "aspiring oenophiles" -- let @vinesleuth @Wine4MeApp help pick out your perfect wine. #ibm…
Got tweet:now #IBMWatson can help me with wine choices - wow #ibminsight
Got tweet:RT @HasnaZarooriHai: Amazing Banner Outside A Wine Shop
“If You Love Someone Today, Then You’ll Surely Love Me Someday"
Got tweet:RT @joelcomm: @Wine4MeApp helps choose the wine you want #IBMInsight #NewWaytoEngage https://t.co/qhTyoMLxME
Got tweet:Chicken?? Whats that? https://t.co/NEaOje0wLQ
Got tweet:How would several flights of #Burgundy 2003 wines vary when tasted now, compared to 10 years ago? https://t.co/3pv5AeMOiq #wine #winelover
Got tweet:#Job #Nashville Full-Time Cashier Wanted (Midtown Wine & Spirits) (1610 Church Street): Midtown Wine & Spirits... https://t.co/TACIrCkj0x
Got tweet:So how have I not heard of the wine4.me app?? Cognitive wine
selection makes total sense. #IBMWatson #ibminsight

Geography

FilterQuery geoQuery = new FilterQuery();

// cities
double[][] sanFrancisco = new double[][] { { -122.75, 36.8 }, { -121.75, 37.8 } };
double[][] newYorkCity = new double[][] { { -74, 40 }, { -73, 41 } };
double[][] cities = Stream.of(sanFrancisco, newYorkCity).flatMap(Stream::of).toArray(double[][]::new);

// states
double[][] california = new double[][] { { 124.434800, 32.433047 }, { -114.015147, 42.120889 } };
double[][] texas = new double[][] { { -106.728330, 25.745428 }, { -93.438336, 36.605486 } };
double[][] newYork = new double[][] { { -79.842750, 40.495969 }, { -71.783881, 45.072033 } };
double[][] states = Stream.of(california, texas, newYork).flatMap(Stream::of).toArray(double[][]::new);

// countries
double[][] usa = new double[][] { { -125.0011, 24.9493 }, { -66.9326, 49.5904 } };

double[][] closeToNorway = new double[][]{new double[]{3.339844, 53.644638}, new double[]{18.984375,72.395706 }};

double northLatitude = 35.2;
double southLatitude = 25.2;
double westLongitude = 62.9;
double eastLongitude = 73.3;
double[][]  pakistan = {{westLongitude, southLatitude},{eastLongitude, northLatitude}};

// set your geofencing ...

// geoQuery.locations(cities);
// geoQuery.locations(texas);
// geoQuery.locations(pakistan);
// geoQuery.locations(usa);

// let's check on Scandinavia... :)
geoQuery.locations(closeToNorway);

stream.filter(geoQuery);

Output

Got tweet:@mattelacchiato yo hatte den falschen Screenshot gepostet und 15s später den Tweet gelöscht ... aber das Netz vergisst nie ;)
Got tweet:https://t.co/wzZLYio7NM
Got tweet:Her bør FFK følge med, imo. Ikke sett de to i år, men spesielt Hoel var glimrende for 08-jr/2 før Kvik.  https://t.co/VBy3Rcv81Y
Got tweet:@animizacja ja bym chodzi?a za nimi, boj?c si? ?e co? ukradn?
Got tweet:Über Zeug nachdenken. https://t.co/Z6xdoHJ2Kd
Got tweet:I just wanna live in harrys hair
Got tweet:@TheRealLiont love you
Got tweet:@emelieperssson Tack för RT!

Vert.x

Surface tweets to webpage

run SimplerServer.main()

Metrics

Run Java

compiled

vertx run com.joelholder.vertx.Dashboard -cp target/my-twitter-play-0.0.1-SNAPSHOT.jar:.

dynamic

cd src/main/java/com/joelholder/vertx/ vertx run com.joelholder.vertx.Dashboard -cp ../../../../../../target/my-twitter-play-0.0.1-SNAPSHOT.jar:.

Run Ruby

cd src/main/rb/com/joelholder/vertx/ vertx run dashboard.rb -cp ../../../../../../target/my-twitter-play-0.0.1-SNAPSHOT.jar:.

Run JavaScript

cd src/main/js/com/joelholder/vertx/ vertx run dashboard.js -cp ../../../../../../target/my-twitter-play-0.0.1-SNAPSHOT.jar:.

Run Groovy

cd src/main/groovy/com/joelholder/vertx/ vertx run dashboard.groovy -cp ../../../../../../target/my-twitter-play-0.0.1-SNAPSHOT.jar:.

Derived from:

https://github.com/vert-x3/vertx-examples

https://github.com/vert-x3/vertx-examples/tree/master/metrics-examples

TODO:

Study this project

https://github.com/johanhaleby/perfect-storm

Evaluate Esper for CEP

http://www.espertech.com/products/esper.php

Get storm running inside Vertx

Resources

Sentiment Dictionaries

http://mpqa.cs.pitt.edu/

Twitter

Twitter Streaming API

https://dev.twitter.com/streaming/overview/request-parameters#locations

Twitter4j

http://twitter4j.org/en/code-examples.html

Geo

Box locators

https://en.wikipedia.org/wiki/Category:Geobox_locator_United_States

Coordinate testing

http://www.gps-coordinates.net/

Infodoc

http://www.nrri.umn.edu/worms/downloads/team/TheGeographicCoordinateSystem.pdf

Storm

Projects

my-own-storm

https://github.com/Sofft/my-own-storm

mbo-storm

https://github.com/mbonaci/mbo-storm/wiki/Storm-setup-in-Eclipse-with-Maven,-Git-and-GitHub

https://github.com/mbonaci/mbo-storm

workshop

https://github.com/kantega/storm-twitter-workshop

https://github.com/kantega/storm-twitter-workshop/wiki/Basic-Twitter-stream-reading-using-Twitter4j

Eclipse

https://gist.github.com/mbonaci/5996278

Tutorial

https://storm.apache.org/documentation/Tutorial.html

Storm Starter

https://github.com/apache/storm/tree/master/examples/storm-starter#getting-started

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published