Hadoop - Environment Setup



This chapter will explain the pre-requisites and installation process required in order to be able to install Hadoop in a Linux environment.

Hadoop can either be set up to run in a single computer or in a cluster consisting of more than one computer.

Hadoop Pre-requisites

The following is a list of applications you need to install on Linux prior to starting with the Hadoop installation:

  • Java 1.6 → Download and install Java 1.6 or higher version from http://www.java.com

  • ssh → Bash shell ssh must be installed and sshd must be running. As such you will have this by default installed on your machine running Linux.

Hadoop Installation

Step - 1

To get a Hadoop distribution, download a recent stable version from one of the apache download mirrors available at http://hadoop.apache.org. At the time of writing this tutorial, I downloaded hadoop-2.2.0.tar.gz

Before your start with download and installation, make sure you have root privilege on your machine running Linux. Next follow the following command sequence:

$ cd /usr/local
$ wget http://www.interior-dsgn.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$ tar xvfz hadoop-2.2.0.tar.gz

Step - 2

Next you will have to find out your Java Installation home using the following command:

$ (readlink -f /usr/bin/java | sed "s:bin/java::")

Usually it will be a sub-directory inside /usr/lib/jvm directory. So once you have JAVA_HOME, setup following environment variables in your bash profile ~/.bashrc and common profile /etc/profile.

export PATH=$PATH:/usr/local/hadoop-2.2.0/bin
export PATH=$PATH:/usr/local/hadoop-2.2.0/sbin
export HADOOP_HOME=/usr/local/hadoop-2.2.0
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME

Actually, here I updated /etc/profile to export all the above variables for other users working on the same machine, specially hadoop user will need all these variables, which we will create when we will run Hadoop in distributed mode.

Step - 3

Edit the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh to define JAVA_HOME to be the root of your Java installation. Just put the following line atthe top of hadoop-env.sh file:

export JAVA_HOME=/usr/lib/jvm/jre-1.7.0

Step - 4

Finally execute your shell profile to bring all the environment variables in effect.

$ . .bashrc

Step - 5

Now your Hadoop environment setup is ready and you can check it by issuing following command:

$ hadoop version

If everything is fine with your setup then you should see something as follows:

Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4

Hadoop Operation Modes

Once you have Hadoop setup ready, you can your Hadoop cluster in one of the three supported modes:

  • Local/Standalone Mode: This mode is very easy to setup and useful for debugging. It is also called non-distributed mode.

  • Pseudo Distributed Mode: In this mode, Hadoop can run on a single node in a pseudo distributed mode where each Hadoop daemons run in a separate java process. It is distributed simulation on single machine. This mode is useful for development.

  • Fully Distributed Mode: This mode is fully distributed with minimum two or more machines as a cluster.

To know more about the detail of each operating mode, you can check sub-chapters given here. By default Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging .

Advertisements