Namenode, Secondary Namenode and Datanodes

Hadoop cluster has two types of nodes, they are:

Namenode
Datanode

In the Hadoop cluster Master node is called as Namenode and the slave nodes are called as Datanodes. Let us see them in detail,

Namenode:

This node is considered as the primary node in the HDFS cluster. All the operations of HDFS cluster is maintained by this node, there will be only one Namenode for the entire HDFS Cluster. Namenode stores the metadata information(information about block storage, replication etc..,), this information is stored persistently on the local disk in the form of two files FSImage/Namespace Image and edit log.

Datanode:

Datanodes acts as slaves in HDFS Cluster. The number of these datanodes will be based on the amount of data that is being stored on the cluster. The work of Datanode will be based on the Namenode instructions. Datanodes store and retrieve data when they are asked to do so either by the Namenode or by the client. At the time of processing the data , these nodes report back to the Namenode periodically using Heart Beat Mechanism.

Here, when we consider the Namenode, it was said that there will be only one Namenode in the Hadoop HDFS cluster, and this Namenode is responsible for the whole maintenance. If it is so, What if this Namenode goes down?

To tolerate this situation we have Secondary Namenode , let us see that in detail.

Secondary Namenode:

Secondary Namenode is not considered as a direct replacement for Namenode. The main role of this Secondary Namenode is periodically merge the FSImage and editlog to prevent the edit log from becoming very large. Secondary Namenode runs on a separate physical system because it requires huge memory to merge the two files. It keeps a copy of merged file in its local file system, in-order to use when the Namenode fails.