Configuring Of HDFS and YARN HA

#1

As part of this section we will see how to enable HDFS Namenode High Availability as well as YARN Resource Manager High Availability while exploring key concepts.

  • High Availability – Overview
  • Configure HDFS Namenode HA
  • Review Properties – HDFS Namenode HA
  • HDFS Namenode HA – Key Concepts
  • Configure YARN Resource Manager HA
  • Review – YARN Resource Manager HA
  • High Availability – Implications

Cluster Topology

We are setting up the cluster on 7+1 nodes. We start with 7 nodes and then we will add one more node later.

  • Gateway(s) and Management Service
    • bigdataserver-1
  • Masters
    • bigdataserver-2
      • Zookeeper
      • Active/ Standby Namenode
    • bigdataserver-3
      • Zookeeper
      • Active/Standby Namenode
      • Active/Standby Resource Manager
    • bigdataserver-4
      • Zookeeper
      • Active/Standby Resource Manager
  • Slaves or Worker Nodes
    • bigdataserver-5 – Datanode, Node Manager
    • bigdataserver-6 – Datanode, Node Manager
    • bigdataserver-7 – Datanode, Node Manager

High Availability – Overview

Let us see high level overview about High Availability for HDFS as well as YARN.

  • Namenode is master for HDFS and Resource Manager is master for YARN.
  • Both Namenode, as well as Resource Manager, are a single point of failure.
  • If Resource Manager is down, no data processing can be done in the cluster. None of the frameworks such as Map Reduce, Spark etc can be used.
  • If Namenode is down, almost everything is unusable. As we cannot access the file system, our data processing jobs also will not run even though there are no issues with respect to the services related to data processing.
  • We need Zookeeper to configure High Availability as well as Failover. Let us see how Failover happens when there is no High Availability for Namenode.
    • We need to start Namenode in safe mode
    • FSImage have to be restored and Editlogs have to be replayed since last checkpoint.
    • If there are hardware failures on Namenode, we will build a new server as Namenode. If there is a change in IP Address or DNS Alias we need to update properties files and deploy on all the nodes.
    • All these steps are tedious, time-consuming and error-prone.
    • High Availability not only speeds up restore and recovery process but also make failover transparent.
    • To facilitate transparent failover, IP Addresses cannot be used directly. They are mapped as part of Namespaces.
  • As part of HA Configuration for Namenode, we will have Active and Passive Namenodes rather than Secondary Namenode
  • After configuring High Availability, Zookeeper can ensure that traffic is automatically failed over to the other node.
  • We can configure High Availability for other services as well. But Namenode is the most critical.

Configure HDFS Namenode HA

Let us see how we can configure High Availability for HDFS Namenode. It can be done using Shared Edits or Journal Nodes. However, using Cloudera Manager only Journal Nodes is supported.

  • Prerequisite – Make sure zookeeper is installed and running along with HDFS components such as Namenode, Secondary Namenode, Datanodes.
  • Let us review what are all added to Zookeeper till this point using zkCli.sh/usr/lib/zookeeper/bin/zkCli.sh -server bigdataserver-2:2181,bigdataserver-3:2181,bigdataserver-4:2181
  • Go to Dashboard and Click on HDFS
  • Click on Actions and click on Enable High Availability and follow these steps to configure HA.
    • Specify the name for the name service – either go with default or custom name (e.g.: nameservice1)
    • Selecting Active Namenode – CM automatically selects the node host or you can manually click on select a host and select Active Namenode host (e.g.: bigdataserver-2)
    • Selecting Standby Namenode – Then we need to select Standby Namenode which must be a different host from active Namenode. Both Active and Standby Namenode hosts should have the same hardware configuration.
    • Next, we should select Journal Nodes by clicking on select hosts.
    • The number of Journal Nodes should be odd number and a minimum of three. We will choose bigdataserver-2 , bigdataserver-3 and bigdataserver-4 .
    • You can enter the same path or different directory path for Journal node directory which is empty and has appropriate permission on all nodes. We will be using /data1/jn on all 3 nodes.
  • CM executes a set of commands that will stop the dependent services, delete Secondary Namenode, reconfigure roles and directories as specified, create a nameservice and failover controllers, restart the dependent services, and deploy the new client configuration. We will review these things using Cloudera Manager Web Interface.
  • Let us also review the output of zkCli.sh command to see what is happening in Zookeeper.
  • You can click on HDFS Service and then Namenode. You will be seeing one as Active and other as Standby.
  • If there are other services running on cluster, then you might have to make changes to them as well.

Task: Stop Active Namenode and see if the Standby become Active. Also validate that there is no downtime to the application. You will also see the stopped server become Standby after few moments.

Review Properties – HDFS Namenode HA

Now let us review core-site.xml and hdfs-site.xml properties to understand how configurations are defined.

  • In core-site.xml, fs.defaultFS will be pointing to nameservice1 instead of static IP address and port number (e.g.: hdfs://nameservice1)
  • In hdfs-site.xml we will see new property by name dfs.nameservices and value nameservice1
  • For both Namenodes , we will have logical name defined under property called dfs.ha.namenodes.nameservice1
  • All the properties in hdfs-site.xml which are pointing to IP address and port numbers for Namenode will now be replaced with logical name of each of the Namenode.
  • We will review the properties directly as the logical names of Namenodes might be different on different environments.

HDFS Namenode HA – Key Concepts

Now let us understand key concepts with respect to HDFS Namenode High Availability. Let us recap HDFS components before we get into HA.

  • HDFS have Namenode, Secondary Namenode and Datanodes
  • When data is saved in the form of files in HDFS
    • Files will be divided into blocks and blocks are saved in Datanodes
    • Mapping between File, block id or name and block location is called as metadata.
    • This metadata will be stored in memory as well as edit logs of Namenode.
    • To control the size of edit logs over period of time, periodically snapshots are taken and they are called as FSImage (using edit logs and last FSImage).
    • This process of creating FSImage using last FSImage and edit logs is called as checkpointing.
  • If Namenode is down, it will take several minutes to restore and recover fsimage and edit logs to rebuild metadata in memory. During the recovery process entire cluster is not usable.
  • Also if there is hard ware failures, migrating Namenode to other node is also time consuming. It also involves in changing Namenode URI in multiple locations.
  • We can overcome these issues by configuring High Availability on Namenode.
  • In HA configuration, instead of having Namenode and Secondary Namenode we will have Active and Passive Namenode. Hence, it is also called as Active-Passive Configuration. Here are the issues HA addresses.
    • Manual involvement in case of an unplanned outage
    • Planned upgrades where Namenode need to be brought down
  • HA Configuration provides us transparent and fast failover of the Namenode.

HDFS HA – Components

Let us see different components related to Namenode High Availability. You can refer to this nicely written article on Namenode HA as part of Hadoop 2.x.

  • Zookeeper – We already have Zookeeper running on our cluster. It will be used as part of Namenode High Availability.
  • Quorum Based Storage – Active Namenode writes to it and Standby Namenode read from it. It is managed by Quorum Journal Manager and the directories are known as Journal Directories.
  • Journal Directories
  • Active and Standby Namenode (from this article)
    • The Active Namenode is responsible for all client operations in the cluster.
    • The Standby NameNode maintains enough state to provide a fast failover.
    • In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate through a group of separate daemons called JournalNodes.
    • The file system journal logged by the Active Namenode at the JournalNodes is consumed by the Standby NameNode to keep it’s file system namespace in sync with the Active.
    • In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information of the location of blocks in your cluster.
    • DataNodes are configured with the location of both the Namenode and send block location information and heartbeats to both Namenode machines.
  • Zookeeper Failover Controller (one per each Namenode)

Automatic Failover

Automatic failover relies on two additional components in an HDFS: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC). In Cloudera Manager, the ZKFC process maps to the HDFS Failover Controller role.

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of HDFS automatic failover relies on ZooKeeper for the following functions:

  • Failure detection – each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other Namenode that a failover should be triggered.
  • Active NameNode election – ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active Namenode crashes, another node can take a special exclusive lock in ZooKeeper indicating that it should become the next active Namenode.

The ZKFailoverController (ZKFC) is a ZooKeeper client that also monitors and manages the state of the NameNode. Each of the hosts that run a Namenode also run a ZKFC. The ZKFC is responsible for:

  • Health monitoring – the ZKFC contacts its local Namenode on a periodic basis with a health-check command. So long as the Namenode responds promptly with a healthy status, the ZKFC considers the Namenode healthy. If the NameNode has crashed, frozen, or otherwise entered an unhealthy state, the health monitor marks it as unhealthy.
  • ZooKeeper session management – when the local Namenode is healthy, the ZKFC holds a session open in ZooKeeper. If the local Namenode is active, it also holds a special lock znode. This lock uses ZooKeeper’s support for “ephemeral” nodes; if the session expires, the lock node is automatically deleted.
  • ZooKeeper-based election – if the local NameNode is healthy, and the ZKFC sees that no other NameNode currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has “won the election”, and is responsible for running a failover to make its local Namenode active. The previous active is fenced if necessary, and then the local Namenode transitions to active state.

Configure YARN Resource Manager HA

Let us see how to configure YARN Resource Manager High Availability using Cloudera Manager. Keep in mind that Namenode HA is critical and complex, others are neither as critical nor complicated.

Go to the YARN service.

  • Select Actions > Enable High Availability.
    • A pop-up window will display showing remaining or eligible hosts for the standby ResourceManager except where current ResourceManager is running.
    • Select the host to run standby ResourceManager to install
    • Click Continue to install standby ResourceManager in the following steps.
      • Stop YARN service
      • Add standby ResourceManager
      • Initialize zookeeper state with RM High Availability
      • Restart YARN
      • Redeploy the client configurations

ResourceManager HA does not affect the JobHistory Server (JHS). JHS does not maintain any state, so if the host fails you can simply assign it to a new host. Here state is nothing but cluster as well as the checkpoint information saved by Application Master for running jobs.

Review – YARN Resource Manager HA

Let us review some of the important properties, YARN Resource Manager HA components etc.

  • HA concepts related to YARN is also similar to HDFS.
  • YARN Resource Manager HA is not very common.
  • As part of YARN architecture, Resource Manager takes care of Resource Management and Job Scheduling.
  • Job Execution Life Cycle is managed by per job Application Master.
  • Even a single Resource Manager is highly reliable. However on very large clusters, some clusters are configured with Resource Manager HA.

  • Automatic Failover
    • There is no failover controller and hence only leader election is possible.
    • Leader election is done by Zookeeper itself.
  • When Failover occurs
    • In-Flight work of running job tasks is lost and hence they have to be restarted from scratch.
    • Standby will become active.
    • For all running jobs Application Masters, as well as Tasks, will be restarted.
    • If there is a job with 100 tasks and if 90 of them are completed and 5 of them are running, when Failover has occurred only 5 will be restarted.
    • This is possible for Map Reduce as Map Reduce Application Master does checkpointing.
  • Let us also review some important properties by connecting to the Gateway Node.
  • Click here for Additional Material about YARN Resource Manager Availability.

High Availability – Implications

Even though there is no significant difference in the syntax of the commands or running applications, still there are some differences which both Administrators, as well as Developers, should be aware of.

  • On a typical configuration cluster, we can connect to HDFS or YARN by using URI from the client.
  • For Namenode URI originally look like this hdfs://bigdataserver-2.c.cellular-axon-219405.internal:8020
  • Now instead of passing IP address and port number, we need to pass nameservice – hdfs://nameservice1
  • With respect to Resource Manager, no matter which server URI you use – you will be automatically redirected to Active Resource Manager.
  • If High Availability is enabled after running the cluster in production for some time, then where ever Namenode URI is hardcoded have to be changed or refactored.
  • Also, you need to refer to official documentation and get it tested thoroughly before enabling High Availability in live production cluster.
  • Namenode Web UI or Resource Manager Web UI can be accessed using Active Namenode or Active Resource Manager.

By this time you should have set up Cloudera Manager, then install Cloudera Distribution of Hadoop, Configure services such as Zookeeper, HDFS, and YARN along with HDFS Namenode as well as YARN Resource Manager High Availability.

Make sure to stop services in Cloudera Manager and also shut down servers provisioned from GCP or AWS to leverage credits or control costs.

0 Likes