There are 5
different daemons in Hadoop Architecture. Daemons mean the processes which are
running in background.
- Name Node
- Secondary Name Node
- Data Node
- Job Tracker
- Task Tracker
Name Node: This Node plays an
important and major role in HDFS Cluster. Before knowing about this node, let
us first know about the different distribution mechanisms which can be used in
Hadoop.
- CDH (Cloudera Distribution for Hadoop)
- Map R
- Horton Works
Currently we are using CDH in our realtime
projects; the drawback in CDH is SPOF (Single Point Of Failure). In CDH
mechanism at any point of time there will be only one Name Node in the cluster.
The Name Node
stores only the Metadata. That means it stores only the physical location of
the data. The processing in the data nodes will be done based on the
instruction of Name Node.
Secondary Name
Node: This
Node is never referred a direct backup to the Name Node. It is just responsible
for housekeeping activities. It copies the files like “FSImage” and “EditLog” from
Name Node. These two files contain the information regarding Metadata. Once if
the Name node is down Secondary Name Node comes into picture. This just
maintains the cluster until the Name Node is recovered. To know more about Secondary Name node have a look at Namenode, Secondary Namenode and Datanodes.
Data Node: These Nodes stores the actual blocks of data. There
is no limit for the number of Data Nodes in the cluster. Atleast a cluster
should have one data node. There is no particular fixed configuration for the
Data Node. To know more about Datanode have a look at Namenode, Secondary Namenode and Datanodes.
Job Tracker: This is meant for assigning and scheduling tasks.
Task Tracker: This is meant for executing task assigned by Job Tracker. Communication between Job Tracker and Task Tracker is done by using Map Reduce jobs.