How to Hadoop-3.1.0 Multi-Node Connection and Configuration?

What is Hadoop?

Hadoop is an open-source library written in Java that enables us to process in parallel on multiple machines with large data sets.

Hadoop stores and manages large data on multiple machines.

HADOOP MULTI-NODE INSTALLATION PROCESS

Note: I am creating multi-node in the easiest way by creating and using everything in root mode on default users like (master, slave1, slave2…etc). Please don’t create any Hadoop users like (Hadoop, hduser, …etc) to configure and share Hadoop installations. I am using root user in all machine to make the connection for multi-node.

Step 1: Update one or all packages on your system

Check hostname of master and slaves by using below command

In Master machine:

master

Similarly for slaves

In Slave Machines:

slave2

Step 4: Edit /etc/hosts

Now lets ssh into master, slave1, slave2 and change the /etc/hosts file, so that we can use the hostname instead of IP every time we wish to use or ping any of these machines

192.168.1.9 master
192.168.1.3 slave1
192.168.1.23 slave2

Step 5: Generate the password-less Ssh key:

Install OpenSSH Server:

Make sure port 22 is opened:

Edit /etc/sysconfig/iptables (IPv4 firewall):

If iptable is not installed in the machine. Please follow step 6 to install iptable

Start OpenSSH Service:

If your site uses IPv6, and you are editing ip6tables, use the line:

Add the below line and save it

-An RH-Firewall-1-INPUT -m tcp -p tcp –dport 22 -j ACCEPT

Restart iptables:

Generate a public ssh key:

In master machine

Create an ssh key:

Create password-less public ssh key by using below command

Set right and Authorizing permission to ssh key:

Copy ssh key from master to slave1:

Only the public key is copied to the server(slave1, slave2) by using below commands. The private key should never be copied to another machine

Copy ssh key from master to slave2:

Note: If the above command shows an error. Please follow step 6 to Configure and Disable firewall and iptables

Test the new key:

Pinging ssh connection from master to their own

Create an ssh key:

Create password-less public ssh key by using below command

Set right and Authorizing permission to ssh key:

Copy ssh key from slave1 to master:

Only the public key is copied to the server(slave1, slave2) by using below commands. The private key should never be copied to another machine

Note: If the above command shows an error. Please follow step 6 to Configure and Disable firewall and iptables

Copy ssh key from slave1 to slave2:

Note: If the above command shows an error. Please follow step 6 to Configure and Disable firewall and iptables

Test the new keys:

Pinging ssh connection from slave1 to their own

Pinging ssh connection from slave1 to master machine

In slave2 machine

Create an ssh key:

Create password-less public ssh key by using below command

Set right and Authorizing permission to ssh key:

 

Copy ssh key from slave2 to master:

Only the public key is copied to the server (slave1, slave2) by using below commands. The private key should never be copied to another machine

Note: If the above command shows an error. Please follow step 6 to Configure and Disable firewall and iptables

Copy ssh key from slave2 to slave 1:

Test the new keys:

Pinging ssh connection from master to their own

Step 6: Disable Firewall and Iptable

Note: Follow these steps only if you’re facing problem in ssh connection or pinging between master and slave machines

Disable firewalld:

Stop firewall:

Check the status of firewalld:

Install iptable:

Enable iptable:

Start iptable:

Stop iptables:

Stop ip6table:

Step 7: Installing Java 1.8

Note: Install java individually on the master machine and slave machines

Download JDK 1.8 from oracle website

Extract JDK-1.8 rpm file by using the below command

Step 8: Installing Hadoop

Note: Download and Install Hadoop in master machine alone and share the Hadoop installed folder to slave machines

Use the below command to download Hadoop-3.1.0 in master system

Step 9: Edit Hadoop Configuration Files

1. Edit ~/.bash_profile

Add the below lines

Source it to reflect changes

2. Edit core-site.xml

3. Edit hdfs-site.xml

4. Edit mapred-site.xml

5. Edit yarn-site.xml

6. Edit Hadoop-env.sh

Step 10: Create a namenode directory in the master machine

Step 11: Modify Masters file and add the IP address of the name node in the master system

Create a masters file in the master machine

Step 12: Modify Slaves file and add the IP address of the data node’s in master system

Create slaves file in the master machine

To view the contents of the master’s file:

It will show master IP address

To view the contents of Slaves file:

Step 13: Copy Hadoop-3.1.0 file to slaves

To Secure copy Hadoop file from Master machine /usr/local/hadoop-3.1.0 to Slave machines

Type the below command in the master machine and copy the file from Master to Slave1

Step 14: Make the data node’s directory on both Slave1 and Slave2 machines

Make a directory in Hadoop-3.1.0 for data node in slave machines

Step 15: Format namenode on the master machine

Step 16: Start Namenode and Datanode

Step 17: Start Nodemanager and Resource manager

Step 18: Start Hadoop Daemons on Master and Slaves

Type the below command in the master machine

10307 Jps
7961 ResourceManager
7105 NameNode
7352 SecondaryNameNode

Type the below command in slave1 machine’s

2780 Jps
2181 NodeManager
1996 DataNode

Type the below command in slave2 machine

1735 Jps
2328 NodeManager
1983 DataNode

Note: If any of the daemons are missing in any one of the machines. Then we need to format the name node and restart the services and check once again. If it is not showing output as exactly we need to check the Hadoop configuration files from the beginning.

Step 19: Check Hadoop Web Interfaces (HADOOP WEB UI)

Web URL for DFS health reports of the master machine

http://192.168.1.9:9870/dfshealth.html#tab-overview

Web URL for DFS health reports of slave1 machine

http://192.168.1.23:9864

Web URL for DFS health reports of slave2 machine

http://192.168.1.3:9864

To verify through the command line

root@master: /usr/local/hadoop-3.1.0/etc/hadoop > hdfs dfsadmin -report

Web URL for the resource manager

http://192.168.1.9:8088/cluster

Congrats! Now we’ve successfully installed Hadoop in multi-node using yarn mode.