Install the Clusters


Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

  • Set up a local CDH Repository
  • Perform OS-level configuration for Hadoop installation
  • Install Cloudera Manager Server and Agents
  • Install CDH using Cloudera Manager
  • Add a New Node to an Existing Cluster
  • Add a Service using Cloudera Manager

Set up a local CDH Repository

Here are the high-level details to set up a local CDH Repository.

  • We can install CDH directly from public repositories. However, it is not a good idea to use public repositories directly while using Packages.
    • Network Intensive
    • Behind the Firewall
  • We will facilitate only one host to connect to public repositories.
    • Parcels – Server on which Cloudera Manager is running or a Proxy Server
    • Packages – Local yum repository server
  • We typically do not set up CDH repository for Parcels.
  • High-level instructions
    • Identify server on which yum repository should be set up.
    • Setup web server such as Apache Web Server (HTTPD).
    • Download repo files from Cloudera under DocumentRoot of Apache Web Server.
    • Create repositories for CM and CDH
    • Update repo files pointing to local yum repository servers
    • Validate by running yum repolist and yum list all commands.

Perform OS-level Configuration

Let us see the OS-level Configuration for Hadoop Installation.

  • Click here for more details about OS-level Configuration details.
  • We can also run Host Inspection and review any warnings and take necessary actions.
  • We will disable swappiness and transparent huge page compaction.

Disable Swappiness

Let us see how we can disable swappiness.

  • Run sysctl vm.swappiness=0 on all the hosts
  • Copy /etc/sysctl.conf to /home/itversity/setup_cluster/files/etc
  • Synchronize on to all the servers using ansible. This will make sure that swappiness is disabled on reboots.

Install Cloudera Manager Server and Agents

Let us discuss details about installing Cloudera Manager Server and Agents.

  • Cloudera Manager Server will be running on one of the hosts in the cluster, while Agents need to run on all the hosts in the cluster.
  • Cloudera Manager Server and Agents together facilitate us to easily set up CDH on the cluster as well as manage them.
  • We started with Packages and migrated to Parcels.
  • If we want to set up using Parcels, then typically we don’t need to have local yum repositories set up for CM or CDH. We can download repo file and then install using Cloudera Manager.
  • Here are the steps involved in setting up of Cloudera Manager Server.
    • Install and Setup Database. We have installed MySQL and then created a database called as scm.
    • Install Cloudera Manager Server using scm database.
  • Once Cloudera Manager Server is set up, we installed Agents on all the hosts using Wizard while setting up CDH.

Install CDH using Cloudera Manager

Let us talk about how we can setup CDH on the cluster.

  • We have installed CDH using Packages and then migrated to Parcels.
  • If we want to setup cluster with Parcels directly, steps are not significantly different, except for one page where it prompts us to use Packages or Parcels.
  • Once the cluster is set up, it does not make much of a difference between Packages and Parcels for Hadoop ecosystem tools such as Hive, Oozie etc.
  • However, if we want to set up non-Hadoop tools such as Spark, Kafka etc., then steps might be a bit different. Also, we can setup Spark 2 only using Parcels at this time.
  • Parcels is the recommended approach by Cloudera. It simplify management of repositories.

Add a New Node to an Existing Cluster

Let us see how to add a new node to an existing cluster.

  • Provision the server and make sure passwordless login is setup. If you are using Ansible, make sure it is added to the hosts file in appropriate host groups.
  • Make sure there is passwordless login enabled between the server on which Cloudera Manager is running and the new host.
  • Go to Hosts -> All Hosts and then click on Add New Hosts to Cluster . It will take us to Add Hosts Wizard .
  • Add ip address or DNS Alias of all the hosts, then proceed further to setup Cloudera Manager Agent as well as CDH on that Host.
  • Determine whether this node will be Gateway or Master or Worker for a given service. Steps will vary based on the choice.
    • Gateway: Configure as Gateway for multiple services.
    • Master: Understand Architecture of the service and move the master component on to that host.
    • Worker: Configure all worker components (Datanode, Node Manager, Resource Manager etc) to the cluster. We also need to make sure data is balanced in the cluster by using balancer. We will see that at a later point in time.

Add Host as Worker

Let us see how we can add the new node as a worker to the cluster.

  • Worker Components:
    • HDFS Datanode
    • YARN Node Manager
    • HBase Region Server
    • Impala Impalad
  • To add a node as HDFS Datanode, we need to have additional storage devices mounted and formatted.

  • Once the node is formatted, either we can use Host Template or add each and every worker component manually.

Add a Service using Cloudera Manager

We have added several services to our cluster using Cloudera Manager.

  • Following are the services added as part of the demonstrations.
    • Zookeeper
    • Hadoop (HDFS and YARN with MR2)
    • Spark and Spark 2
    • Hive and Impala
    • Sqoop, Pig, Oozie and Hue
    • HBase
    • Kafka
  • Here are the general guidelines to add any supported service.
    • Understand Architecture – Identify all applicable Master Components as well as Worker Components.
    • Add the service based on Architecture
    • Understand Properties Files
    • Validate or Troubleshoot by going through service level logs