KijiChopsticks

KijiChopsticks provides a simple data analysis language using KijiSchema and Scalding.

Compilation

KijiChopsticks requires Apache Maven 3 to build. It may built by running the command

mvn clean package

from the root of the KijiChopsticks repository. This will create a release in the target directory.

Running the NewsgroupWordCounts example

The following instructions assume that a functional KijiBento minicluster has been setup and is running. This example uses the 20Newsgroups dataset.

First, create and populate the 'words' table:

kiji-schema-shell --file=words.ddl
kiji jar target/kiji-chopsticks-0.1.0-SNAPSHOT.jar org.kiji.chopsticks.NewsgroupLoader \
    kiji://.env/default/words <path/to/newsgroups/root/>

Run the word count, outputting to hdfs:

kiji jar target/kiji-chopsticks-0.1.0-SNAPSHOT.jar \
    com.twitter.scalding.Tool org.kiji.chopsticks.NewsgroupWordCount \
    --input kiji://.env/default/words --output ./wordcounts.tsv --hdfs

Check the results of the job:

hadoop fs -cat ./wordcounts.tsv/part-00000 | grep "\<foo\>"

You should see something similar to:

"'foo'\''bar'". 1
"foo"); 1
"foo'bar",  1
"foo.txt  1
"foo.txt" 1
"foo:0",  1
<foo> 1
<foo@cs.rice.edu> 1
>foo  1
`foo' 1
bar!foo!frotz 1
foo 2
foo%bar.bitnet@mitvma.mit.edu 1
foo-boo 1
foo/file  1
foo:  1
foo@mhfoo.pc.my 1

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml
words.ddl		words.ddl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/main

src/main

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

words.ddl

words.ddl

Repository files navigation

KijiChopsticks

Compilation

Running the NewsgroupWordCounts example

About

Releases

Packages

strategist922/kiji-chopsticks

Folders and files

Latest commit

History

Repository files navigation

KijiChopsticks

Compilation

Running the NewsgroupWordCounts example

About

Resources

Stars

Watchers

Forks