Apache Hadoop Primary Namenode and secondary Namenode architecture is designed as below:
The conf/masters file defines the master nodes of any single or multimode cluster. On master, conf/masters that it looks like this:
This conf/slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will run. When you have both the master box and the slave box to act as Hadoop slaves, you will see same hostname is listed in both master and slave.
On master, conf/slaves looks like as below:
If you have additional slave nodes, just add them to the conf/slaves file, one per line. Be sure that your namenode can ping to those machine which are listed in your slave.
If you are building a test cluster, you don’t need to set up secondary name node on a different machine, something like pseudo cluster install steps. However if you’re building out a real distributed cluster, you must move secondary node to other machine and it is a great idea. You can have Secondary Namenode on a different machine other than Primary NameNode in case the primary Namenode goes down.
The masters file contains the name of the machine where the secondary name node will start. In case you have modified the scripts to change your secondary namenode details i.e. location & name, be sure that when the DFS service starts its reads the updated configuration script so it can start secondary namenode correctly.
In a Linux based Hadoop cluster, the secondary namenode is started by bin/start-dfs.sh on the nodes specified in conf/masters file. Initially bin/start-dfs.sh calls bin/hadoop-daemons.sh where you specify the name of master/slave file as command line option
Start Secondary Name node on demand or by DFS:
Location to your Hadoop conf directory is set using $HADOOP_CONF_DIR shell variable. Different distributions i.e. Cloudera or MapR have setup it differently so have a look where is your Hadoop conf folder.
To start secondary name node on any machine using following command:
$HADOOP_HOME/bin/hadoop –config $HADOOP_CONF_DIR secondarynamenode
When Secondary name node is started by DFS it does as below:
$HADOOP_HOME/bin/start-dfs.sh starts SecondaryNameNode
>>>> $bin”/hadoop-daemons.sh –config $HADOOP_CONF_DIR –hosts masters start secondarynamenode
In case you have changed secondary namenode name say “hadoopsecondary” then when starting secondary namenode, you would need to provide hostnames, and be sure these changes are available to when starting bin/start-dfs.sh by default:
$bin”/hadoop-daemons.sh –config $HADOOP_CONF_DIR –hosts hadoopsecondary start secondarynamenode
which will start secondary namenode on ALL hosts specified in file ” hadoopsecondary “.
How Hadoop DFS Service Starts in a Cluster:
In Linux based Hadoop Cluster:
Alternative to backup Namenode or Avatar Namenode:
Secondary namenode is created as primary namenode backup to keep the cluster going in case primary namenode goes down. There are alternative to secondary namenode available in case you would want to build a name node HA. Once such method is to use avatar namenode. An Avatar namenode can be created by migrating namenode to avatar namenode and avatar namenode must build on a separate machine.
Technically when migrated Avatar namenode is the namenode hot standby. So avatar namenode is always in sync with namenode. If you create a new file to master name node, you can also read in standby avatar name node real time.
In standby mode, Avatar namenode is a ready-only name node. Any given time you can transition avatar name node to act as primary namenode. When in need you can switch standby mode to full active mode in just few second. To do that, you must have a VIP for name node migration and a NFS for name node data replication.