Skip to content

antoine-tran/Cloud9

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud9

A Hadoop toolkit for working with big data: http://cloud9lib.org/

Features added in this forked project:

  1. XML Input splitting: Although the current version of Cloud9 supports reading compressed files (.bzip2 etc.) in both local and Map Reduce setting, it does not support the splitting of tag blocks into individual InputSplit. Here I have integrated the great code from wikihadoop (https://github.com/whym/wikihadoop) into WikipediaPageInputFormat, to make the processing parallel without the need for repacking and decompressing the dump file.

About

Cloud9 is a Hadoop toolkit for working with big data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 98.7%
  • Other 1.3%