Setup CM, Install CDH and Setup Cloudera Management Services


In this section, we will setup Cloudera Manager in one of the nodes in the cluster and install Cloudera Manager Agents in all the nodes of the cluster while installing CDH. Also we will see how to configure Cloudera Management Service on the first node.

  • Setup Pre-requisites
  • Install Cloudera Manager
  • Licensing and Installation Options
  • Install CM and CDH on all nodes
  • CM Agents and CM Server
  • Setup Cloudera Management Service
  • Cloudera Management Service – Components

Let us start the first server where we have to setup pre-requisites and Cloudera Manager. We don’t need all the servers to be up and running at this time.

Setup Pre-requisites

Let us setup pre-requisites like JDK and MySQL database so that we can configure cloudera manager and other CDH components using external database.

  • Install Java SDK that come with Cloudera – sudo yum install oracle-j2sdk1.7 -y
  • Install mysql (mariadb)

  • Setup mysql connector – sudo yum -y install mysql-connector-java
  • We need to install mysql-connector-java on other nodes where services will be running and have to connect to databases created. We will take care of it while setting up services like Hive, Oozie etc.

Install Cloudera Manager

Cloudera Manager is the management tool for setup and manage clusters. It provides wizard to setup cluster as well as to configure alerts in case of any issues.

  • Install Cloudera Manager – sudo yum -y install cloudera-manager-server
  • Stop Cloudera Manager – sudo systemctl stop cloudera-scm-server
  • Setup scm database – sudo /usr/share/cmf/schema/ mysql -h localhost scm root itversity
  • Start Cloudera Manager – sudo systemctl start cloudera-scm-server
  • Log file location – /var/log/cloudera-scm-server
  • You can check log files using sudo tail -F /var/log/cloudera-scm-server/cloudera-scm-server.log
  • Make sure to run sudo systemctl enable cloudera-scm-server to add it to startup scripts. It will make sure to bring up cloudera-scm-server on reboots.

Licensing and Installation Options

Let us understand different licensing options as well as installation options.

  • Different licensing option – Express vs. Enterprise license
  • We are not going to cover Enterprise licensing in detail. It is typically done between management of your enterprise and Cloudera sales team.
  • Installation using Parcels
    • Recommended by Cloudera
    • We can create local repositories for parcels by following these instructions.
    • Parcels enable Cloudera Manager to easily manage the software on your cluster, automating the deployment and upgrade of service binaries.
    • We will have end to end demo using parcels at a later point in time
  • Installation using Packages.
  • We will be installing using packages for now.

Install CM and CDH on all nodes

Now let us install CM and CDH on all nodes using packages. It will take care of installing Cloudera Manager Agents along with CDH components such as Hadoop, Spark etc on all nodes in the cluster. Once installation is done, we will be configuring one service at a time as we get into the details with respect to each of the service that comes as part of CDH.

Let us start servers from 2 to 7 to set up CDH on all nodes except 8th one. We will use 8th one to perform tasks such as adding nodes to existing cluster.

  • Login to Cloudera manager – :7180
  • Choose Version of Cloudera
  • Adding hosts to the cluster environment – Enter the hostnames of all servers. e.g.: bigdataserver-[1-7]. These are hostnames pointing to internal IPs.
  • Selecting a repository – Since we set up a local repository you can give the same URL.
  • Installing Java – Check to install JDK
  • Single User Mode – Don’t enable Single User Mode
  • SSH credentials for Cloudera Manager to login other hosts. – Give the username and Google Cloud Private Key
  • Cluster installation – Cluster install will take few minutes

CM Agents and CM Server

We started with Cloudera Manager Server and then we have installed CDH and CM components using Cloudera Manager web interface. You should have observed that Cloudera Manager agents are installed as part of installation of Cloudera Manager Components.

  • Agents will be running on all the nodes in the cluster including one which have Cloudera Manager.
  • Agents periodically send details to Cloudera Manager Server.
  • Cloudera Manager Server store the information sent by agents in MySQL database.
  • You can see the information sent by agents in the form of dashboards using Cloudera Manager. You need to setup Cloudera Management Service for this.
  • If Cloudera Manager Server is not getting information from Cloudera Manager Agent, you should be able to validate whether agents are running or not.
    • You can login to the node on which Agent is not able to send the information and run this command – sudo systemctl status cloudera-manager-agent
    • You can try starting or restarting the agent using systemctl command.
    • You can also check logs by going to this location – /var/log/cloudera-scm-agent

Setup Cloudera Management Service

Now let us setup Cloudera Management Service. We will ignore those that are not automatically selected and switch all the components to bigdataserver-1

  • Go to Cloudera Manager and Add Management Service
  • Make sure all the services are running on bigdataserver-1
  • You need to configure the database for Reports Manager
  • And then, Cloudera Management service is installed.

Cloudera Management Service – Components

The following are several services as part of Cloudera Management Service:

  • Service Monitor – To monitor services like Hadoop, hive etc. that are installed in the cluster
  • Activity Monitor – To monitor the jobs run by different cluster services like Pig, Hive, Oozie, MapReduce etc.
    • Activity Monitor is not included while setting up Management Service.
    • Now let us see how we can add this service.
      • Click on drop down for Cloudera Management Service
      • Click on Add Role Instances
      • As activity monitor require database, let us create database in our mysql server, create user and grant permissions to the user on database.
      • Use newly created database and complete setup process for Activity Monitor.
  • Host Monitor – To monitor hosts themselves to monitor how much CPU has been used, the hard disk has been used.
  • Reports Manager – Provides the reports in dashboards by getting metrics from agents and consolidate according to the reporting requirement.
  • Event Server – When the different cluster services are running, there are several events that are running parallelly in the cluster. These events can generate exception or errors which needs attention to take necessary action.
  • Alert Publisher – To send alerts using SMTP

By now you should have setup MySQL, create necessary databases for different services, install cloudera manager and then setup CDH on all 7 nodes. It is time for us to configure services and understand concepts behind each of the service.

Make sure to stop all the servers if you are not going to continue further at this time