The Programmers Book

Apache Hadoop 3.x installation on Ubuntu (multi node cluster).

This documents explains step by step Apache Hadoop installation version (hadoop 3.1.1) with master node (namenode) and 3 worker nodes (datanodes) cluster on Ubuntu.

Below are the 4 nodes and it’s IP addresses I will be referring here.      namenode      datanode1      datanode2      datanode3

And, my login user is “ubuntu”

1. Apache Hadoop Installation

  1. Update the source list of ubtuntu
sudo apt-get update

2. Install SSH

sudo apt-get install ssh

3. Setup password less login between all namenode and datanodes in cluster.

The master node will use an ssh-connection to connect to other nodes with key-pair authentication, to manage the cluster.

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

ssh-keygen command creates below files.

ubuntu@namenode:~$ ls -lrt .ssh/
-rw-r–r– 1 ubuntu ubuntu 397 Dec 9 00:17
-rw——- 1 ubuntu ubuntu 1679 Dec 9 00:17 id_rsa

Copy to authorized_keys under ~/.ssh folder.

cat >> ~/.ssh/authorized_keys

Copy authorized_keys to all data nodes.

scp .ssh/authorized_keys datanode1:/home/ubuntu/.ssh/authorized_keys
scp .ssh/authorized_keys datanode2:/home/ubuntu/.ssh/authorized_keys
scp .ssh/authorized_keys datanode3:/home/ubuntu/.ssh/authorized_keys

4. Add all our nodes to /etc/hosts.

sudo vi /etc/hosts

5. Install JDK1.8 on all 4 nodes

sudo apt-get -y install openjdk-8-jdk-headless

Post JDK install, check if it installed successfully by running “java -version”

6. Apache Hadoop installation version 3.1.1 on all 4 nodes

Download Hadoop latest version using wget command.


Once your download is complete, unzip the file’s contents using tar, a file archiving tool for Ubuntu and rename the folder to hadoop

tar -xzf hadoop-3.1.1.tar.gz
mv hadoop-3.1.1 hadoop

7. Apache Hadoop configuration – Setup environment variables.

Add hadoop environment variables to .bashrc file. open file in vi editor and add below variables.

vi ~/.bashrc

export HADOOP_HOME=”/home/ubuntu/hadoop”

Now load the environment variables to the opened session

source ~/.bashrc

2. Configuring hadoop master node and all worker nodes

Make below configurations on namenode and on all 3 data nodes.

  1. Update

edit ~/hadoop/etc/hadoop/ file and add the JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

2. Update core-site.xml

edit ~/hadoop/etc/hadoop/core-site.xml


3. Update hdfs-site.xml

edit ~/hadoop/etc/hadoop/hdfs-site.xml


4. Update yarn-site.xml

edit ~/hadoop/etc/hadoop/yarn-site.xml


5. Update mapred-site.xml 

edit ~/hadoop/etc/hadoop/mapred-site.xml

[Note: This configuration required only on namenode hoever, it will not harm if you configure it on datanodes]


6. Create data folder

create data folder and change it’s permissions to login user. I’ve logged in as ubuntu user, so you see with ubuntu.

sudo mkdir -p /usr/local/hadoop/hdfs/data
sudo chown ubuntu:ubuntu -R /usr/local/hadoop/hdfs/data
chmod 700 /usr/local/hadoop/hdfs/data

3. Create master and workers files

  1. Create master file

The file masters is used by startup scripts to identify the namenode. so edit ~/hadoop/etc/hadoop/masters and add your namenode IP.

2. Create workers file

The file workers is used by startup scripts to identify datanodes. edit ~/hadoop/etc/hadoop/workers and add all your datanode IP’s

This completes Apache Hadoop installation and Hadoop configuration.

3. Format HDFS and start cluster

  1. Format HDFS

HDFS needs to be formatted like any classical file system. On node-master, run the following command:

hdfs namenode -format

Your Hadoop installation is now configured and ready to run.

2. Start cluster

Start the HDFS by running the following script from namenode

You should see the following lines

Starting namenodes on []
Starting datanodes
Starting secondary namenodes [namenode]

jps on namenode should list the following

ubuntu@namenode:~$ jps
18978 SecondaryNameNode
19092 Jps
18686 NameNode

jps on datanodes should list the following

ubuntu@datanode1:~$ jps
14012 Jps
11242 DataNode

And by accessing you should see the following namenode web UI

Apache Hadoop Installation
Apache Hadoop Installation

3. Test by uploading a file to hdfs

Writing and reading to HDFS is done with command hdfs dfs. First, manually create your home directory. All other commands will use a path relative to this default home directory: (note that ubuntu is my loggedin user. If you logon with different user then please use your userid instead of ubuntu)

hdfs dfs -mkdir -p /user/ubuntu/

Get a books file from the Gutenberg project

wget -O alice.txt

upload downloaded file to hdfs using -put

hdfs dfs -mkdir books
hdfs dfs -put books/alice.txt

List a file on hdfs

hdfs dfs -ls

There are many commands to manage your HDFS. For a complete list, you can look at the Apache HDFS shell documentation

4. Stopping cluster

You should see the below output.

Stopping namenodes on []
Stopping datanodes
Stopping secondary namenodes [namenode]


Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *