Cloud9

A Hadoop toolkit for working with big data: http://cloud9lib.org/

Features added in this forked project:

XML Input splitting: Although the current version of Cloud9 supports reading compressed files (.bzip2 etc.) in both local and Map Reduce setting, it does not support the splitting of tag blocks into individual InputSplit. Here I have integrated the great code from wikihadoop (https://github.com/whym/wikihadoop) into WikipediaPageInputFormat, to make the processing parallel without the need for repacking and decompressing the dump file.

Name		Name	Last commit message	Last commit date
Latest commit History 1,338 Commits
data		data
docs		docs
etc		etc
extras/memcached		extras/memcached
ivy		ivy
src		src
.gitignore		.gitignore
HISTORY.md		HISTORY.md
README.md		README.md
build.xml		build.xml
index.html		index.html
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

etc

etc

extras/memcached

extras/memcached

ivy

ivy

src

src

.gitignore

.gitignore

HISTORY.md

HISTORY.md

README.md

README.md

build.xml

build.xml

index.html

index.html

pom.xml

pom.xml

Repository files navigation

Cloud9

Features added in this forked project:

About

Releases

Packages

Languages

antoine-tran/Cloud9

Folders and files

Latest commit

History

Repository files navigation

Cloud9

Features added in this forked project:

About

Resources

Stars

Watchers

Forks

Languages