Skip to content

no2sql/CLOUDS-LAB

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Laboratory material for the Clouds course: Link

This space supports the Lectures on Hadoop MapReduce, Hadoop Pig and Hadoop HBase. Each lecture has an associated Lab, consisting in a directory holding a description of the exercises, source code, solutions and output data. Input data is currently stored in a private Hadoop deployment at Eurecom; in some cases, scripts to generate input data or small input samples are provided.

For information, the Hadoop cluster consists in 40 nodes, each with a quad-core CPU, 4 GB of RAM, and (only) a total of 3.5 TB storage space (with replication factor 3). Each machine is equipped with a single Gigabit Ethernet card.

Acknowledgements: many exercises have been profoundly influenced by two following resources:

  • Tom White, Hadoop, The Definitive Guide, Y!Press, O'Reilly
  • Jimmy Lin, Chris Dyer, Data-Intensive Text Processing with MapReduce, Morgan Claypool ed.
  • Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge University Press
  • Lars George, HBase: The Definitive Guide, O'Reilly
  • Hortonworks Blog: http://hortonworks.com/blog/

Pre-requisites for the exercises

Next, we provide some information on the software setup required to use the laboratory material. For the laboratory sessions, we suggest students to download the whole repository, following this Link

Software setup

To work with the exercises, you need to download and install java sdk and eclipse. You also need to download and install Hadoop core jar files.

NOTE: for students attending the Lab sessions at Eurecom, the software setup has been done for you. Refer to a teaching assistant for further information. See also how to configure bash below.

Links:

  • Java download page: Link

  • Hadoop download page (hadoop-0.20.203.0): Link

  • Hadoop Pig download page (pig-0.9.2): Link

  • HBase download page (hbase-0.92.0): Link

  • Eclipse download page: Link

Configuring Bash:

Note that this configuration works for studens machines in Laboratory rooms 1 and 2, and is tailored to the private Hadoop deployment at Eurecom.


export JAVA_HOME=/home/Admin_Data/hadoop/jdk1.6.0_24

export PIG_HADOOP_VERSION=20

export PIG_CLASSPATH=/home/Admin_Data/hadoop/hadoop/conf/

export HADOOP_HOME=/home/Admin_Data/hadoop/hadoop/

export HBASE_HOME=/home/Admin_Data/hadoop/hbase/

export PATH=$HADOOP_HOME/bin:$PATH

export PATH=/home/Admin_Data/hadoop/pig-0.9.2/bin:$PATH

export PATH=$HBASE_HOME/bin:$PATH

Links to the three Laboratories

  • Laboratory on Scalable Algorithm Design in MapReduce Link

  • Laboratory on Pig and Pig Latin Link

  • Laboratory on HBase Link

About

Laboratory Material for the course on Cloud Computing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published