Hadoop 2.6.4 fully distributed mode installation on ubuntu 14.04


Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.

The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system installed.

2) Apache Hadoop 2.6.4 Software (Download Here)

Fully Distributed Mode (Multi Node Cluster)

This post descibes how to install and configure Hadoop clusters ranging from a few nodes to extremely large clusters. To play with Hadoop, you may first want to install it on a single machine (see Single Node Setup).

Hadoop Fully Distributed Mode Installation on Ubuntu 14.04

On All machines – (HadoopMaster, HadoopSlave1, HadoopSlave2)

Step 1 – Update. Open a terminal (CTRL + ALT + T) and type the following sudo command. It is advisable to run this before installing any package, and necessary to run it to install the latest updates, even if you have not added or removed any Software Sources.

Step 2 – Installing Java 7.

Step 3 – Install open-ssh server. It is a cryptographic network protocol for operating network services securely over an unsecured network. The best known example application is for remote login to computer systems by users.

Step 4 – Edit /etc/hosts file.

/etc/hosts file. Add all machines IP address and hostname. Save and close.

Step 5 – Create a Group. We will create a group, configure the group sudo permissions and then add the user to the group. Here ‘hadoop’ is a group name and ‘hduser’ is a user of the group.

Step 6 – Configure the sudo permissions for ‘hduser’.

Since by default ubuntu text editor is nano we will need to use CTRL + O to edit.

Add the permissions to sudoers.

Use CTRL + X keyboard shortcut to exit out. Enter Y to save the file.

Step 7 – Creating hadoop directory.

Step 8 – Change the ownership and permissions of the directory /usr/local/hadoop. Here ‘hduser’ is an Ubuntu username.

Step 9 – Creating /app/hadoop/tmp directory.

Step 10 – Change the ownership and permissions of the directory /app/hadoop/tmp. Here ‘hduser’ is an Ubuntu username.

Step 11 – Switch User, is used by a computer user to execute commands with the privileges of another user account.

Step 12 – Generating a new SSH public and private key pair on your local computer is the first step towards authenticating with a remote server without a password. Unless there is a good reason not to, you should always authenticate using SSH keys.

Step 13 – Now you can add the public key to the authorized_keys

Step 14 – Adding hostname to list of known hosts. A quick way of making sure that ‘hostname’ is added to the list of known hosts so that a script execution doesn’t get interrupted by a question about trusting computer’s authenticity.

Only on HadoopMaster Machine

Step 15 – Switch User, is used by a computer user to execute commands with the privileges of another user account.

Step 16 – ssh-copy-id is a small script which copy your ssh public-key to a remote host; appending it to your remote authorized_keys.

Step 17 – ssh is a program for logging into a remote machine and for executing commands on a remote machine. Check remote login works or not.

Step 18 – Exit from remote login.

Same steps 16, 17 and 18 for other machines (HadoopSalve2).

Step 19 – Change the directory to /home/hduser/Desktop , In my case the downloaded hadoop-2.6.4.tar.gz file is in /home/hduser/Desktop folder. For you it might be in /downloads folder check it.

Step 20 – Untar the hadoop-2.6.4.tar.gz file.

Step 21 – Move the contents of hadoop-2.6.4 folder to /usr/local/hadoop

Step 22 – Edit $HOME/.bashrc file by adding the java and hadoop path.

$HOME/.bashrc file. Add the following lines

Step 23 – Reload your changed $HOME/.bashrc settings

Step 24 – Change the directory to /usr/local/hadoop/etc/hadoop

Step 25 – Edit hadoop-env.sh file.

Step 26 – Add the below lines to hadoop-env.sh file. Save and Close.

Step 27 – Edit core-site.xml file.

Step 28 – Add the below lines to core-site.xml file. Save and Close.

Step 29 – Edit hdfs-site.xml file.

Step 30 – Add the below lines to hdfs-site.xml file. Save and Close.

Step 31 – Edit yarn-site.xml file.

Step 32 – Add the below lines to yarn-site.xml file. Save and Close.

Step 33 – Edit mapred-site.xml file.

Step 34 – Add the below lines to mapred-site.xml file. Save and Close.

Step 35 – Edit slaves file.

Step 36 – Add the below line to slaves file. Save and Close.

Step 37 – Secure copy or SCP is a means of securely transferring computer files between a local host and a remote host or between two remote hosts. Here we are transferring configured hadoop files from master to slave nodes.

Step 38 – Here we are transferring configured .bashrc file from master to slave nodes.

Step 39 – Change the directory to /usr/local/hadoop/sbin

Step 40 – Format the datanode.

Step 41 – Start NameNode daemon and DataNode daemon.

Step 42 – Start yarn daemons.

OR

Instead of steps 41 and 42 you can use below command. It is deprecated now.

Step 43 – The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

Only on slave machines – (HadoopSlave1 and HadoopSlave2)

Only on HadoopMaster Machine

Once the Hadoop cluster is up and running check the web-ui of the components as described below

NameNode Browse the web interface for the NameNode; by default it is available at

ResourceManager Browse the web interface for the ResourceManager; by default it is available at

Step 44 – Make the HDFS directories required to execute MapReduce jobs.

Step 45 – Copy the input files into the distributed filesystem.

Step 46 – Run some of the examples provided.

Hadoop Fully Distributed Mode Installation on Ubuntu 14.04

Step 47 – Examine the output files.

Step 48 – Stop NameNode daemon and DataNode daemon.

Step 49 – Stop Yarn daemons.

OR

Instead of steps 48 and 49 you can use below command. It is deprecated now.

+

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *