Safe Mode High Name Node Availability in Hadoop Cluster


Safe Mode and High Name Node Availability in Hadoop Cluster – Big Data Analytics

This article discusses basic Hadoop concepts in Big Data Analytics such as,

  1. Safe Mode in HDFS
  2. High NameNode Availability

Video Tutorial

1. Safe Mode in HDFS

What is Safe Mode…?

When the NameNode starts or restarts after a certain period of time, it enters a read-only safe mode where blocks cannot be added, replicated, or deleted. Safe Mode enables the NameNode to perform two important processes:

First, the previous file system state is reconstructed by loading the fsimage file into memory and replaying the edit log.

Second, the mapping between data blocks and DataNodes are created by waiting for enough of the DataNodes to register so that at least one copy of the data block information is available. Not all DataNodes are required to register before HDFS exits from Safe Mode. The registration process may continue for some time.

When there is a file system issue or hardware issue or some unavoidable issue that must be addressed by the administrator by entering into safe mode manually by using the command. The following command is used to move into safe mode by the administrator.

hdfs dfsadmin-safemode

2. High NameNode Availability

With early Hadoop installations, the Hadoop Distributed File System (HDFS) cluster has a single NameNode. The failure of NameNode could bring down the entire Hadoop cluster. That is the early Hadoop installation is susceptible to a single point of failure. To avoid the failure of NameNode, the NameNode hardware is often employed with redundant power supplies and storage, but it was still susceptible to other failures.

The solution to avoid the single point failure in Hadoop is to use the NameNode High Availability (HA). Here multiple numbers of NameNodes are used. If the active or primary NameNode is failed, the standby NameNode take the charge as the active NameNode. The standby NameNode acts as a true failover node.

Each NameNode machine is configured with exactly the same software. One of the NameNode machines is in the Active state, and the other is in the Standby state. As in a single NameNode Hadoop cluster, the Active NameNode is responsible for all client HDFS operations such as read, write, etc in the cluster. The Standby or Name Node maintains the same information as in the active name node for a fast failover (if required). To guarantee the file system state (metadata information) is preserved at the name node, both the Active and Standby NameNodes receive block reports from the DataNodes. Also, the active name node shares the file system state to the standby name node through journal nodes.

A SecondaryNameNode is not required in the High Name Node Availability configuration because the Standby node also performs the tasks of the Secondary NameNode. Apache Zookeeper is used to monitor the health of active Name Node. Once the Zookeeper detects the failure of the name node, Zookeeper selects one standby name node to act as the active name node.

High NameNode Availability
High Name Node Availability

This article discusses the Safe Mode and High Name Node Availability in Hadoop Cluster – Big Data Analytics. Don’t forget to give your comment and Subscribe to our YouTube channel for more videos and like the Facebook page for regular updates.

Leave a Comment

Your email address will not be published. Required fields are marked *