Yarn setup on Hadoop 3.1


This post explains how to setup Yarn master on hadoop 3.1 cluster and run a map reduce program.

Before you proceed this document, please make sure you have Hadoop3.1 cluster up and running. if you do not have a setup, please follow below link to setup your cluster and come back to this page.

Apache Hadoop 3.1.1 Multi Node Cluster Setup on Ubuntu 18.04.1 LTS

By default Yarn comes with Hadoop distribution hence there is no need of additional installation.

1. Add below properties to yarn-site.xml


    yarn.nodemanager.resource.memory-mb
    1536


    yarn.scheduler.maximum-allocation-mb
    1536


    yarn.scheduler.minimum-allocation-mb
    128


    yarn.nodemanager.vmem-check-enabled
    false

2. Add below properties to mapred-site.xml file


	yarn.app.mapreduce.am.resource.mb
	512


	mapreduce.map.memory.mb
	256


	mapreduce.reduce.memory.mb
	256


	yarn.app.mapreduce.am.env
	HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME


	mapreduce.map.env
	HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME


	mapreduce.reduce.env
	HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME

3. Copy yarn-site.xml and mapred-site.xml files to all 3 data nodes

Below is example to copy to datanode1 using scp command. repeat this setup for all your data nodes.

scp hadoop/etc/hadoop/yarn-site.xml datanode1:/home/ubuntu/hadoop/etc/hadoop/
scp hadoop/etc/hadoop/mapred-site.xml datanode1:/home/ubuntu/hadoop/etc/hadoop/

4. Start YARN from namenode/node-master

start-yarn.sh

You should see following lines.

ubuntu@namenode:~$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

jps on namenode should list the following.

ubuntu@namenode:~$ jps
11281 Jps
10740 SecondaryNameNode
10442 NameNode
10972 ResourceManager

Note that SecondaryNameNode & NameNode were started with start-hdfs.sh file. With start-yarn.sh command it started ResourceManager on namenode and NodeManager on data nodes.

Now on any datanode run jps command and confirm NadeManager is running.

ubuntu@datanode1:~$ jps
10273 DataNode
10648 Jps
10463 NodeManager

5. To stop YARN, run the following command on node

stop-yarn.sh

6. Yarn UI

Start your Yarn in case if you have stopped it. Now open your favorite browser and enter http://192.168.1.100:8088/cluster (replace 92.168.1.10 with your namenode ip)

yarn

7. Run Mapreduce  example.

yarn jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount "books/*" output

8. Check completed job on Yarn UI.

You should see an entry with application ID similar to “application_1547102810368_0001”  and the status “FINISHED” state.

9. Yarn logs

To look at the yarn logs, get your job application ID from Yarn UI and run below command.

yarn logs -applicationID application_1547102810368_0001

References

https://hadoop.apache.org/

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *