Configure the Clusters

#1

Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

  • Configure a Service using Cloudera Manager
  • Create an HDFS user’s home directory
  • Configure NameNode HA
  • Configure ResourceManager HA
  • Configure proxy for Hiveserver2/Impala

Configure a Service using Cloudera Manager

Let us understand details about Configuring a Service using Cloudera Manager.

  • We can add any supported service using Cloudera Manager.
  • All the services exhibit default run-time behavior.
  • If we need to change the run-time behavior of any service, we need to identify the appropriate property and update.
  • If we could not identify any property, then we need to use safety-valve.

Create an HDFS user’s home directory

This is extensively covered as part of HDFS Commands. You can go to Install CM and CDH – Important HDFS Commands and then Creating Directories and Changing Ownership

You can also go to subsequent topics to understand more about permissions and ownership of files and directories.

Configure NameNode HA

This is extensively covered as part of NameNode HA. You can go to Install CM and CDH – Configuring HDFS and YARN HA and then all the topics to HDFS NameNode HA .

Configure ResourceManager HA

This is extensively covered as part of NameNode HA. You can go to Install CM and CDH – Configuring HDFS and YARN HA and then all the topics to HDFS ResourceManager HA .

Configure proxy for HiveServer2/Impala

Let us see how we can configure a proxy for Hiveserver2 as well as Impala. To configure Proxy we need to have software called haproxy.

  • We can have multiple servers for clients to connect.
  • There are several advantages of High Availability and Proxy.
    • Load Balancing
    • Transparent Fail Over
  • When we have multiple servers for client connectivity, we should not give specific server’s IP addresses in our applications. If that server goes down all the client traffic using the IP address will get affected.
  • Using a proxy, we can hide such complexity. We will use proxy ip address and it will resolve to which server it should connect to.
  • As a root user, install haproxy – sudo yum -y install haproxy
  • We will be setting up High Availability on at least 2 servers.
  • Enable haproxy as part of the startup services – sudo chkconfig haproxy on
  • For each service we need to update haproxy config file – /etc/haproxy/haproxy.cfg .
  • We might have to restart once the services are added.

Configure proxy for HiveServer2

Let us see how we can configure a proxy for HiveServer2.

  • We have seen how to launch Hive CLI and run the commands. However, we need to use JDBC to connect to Hive with respect to tools like Tableau so that we can generate reports.
  • As part of the cluster, we get a tool called beeline using which we can validate connecting to Hive Server via JDBC. beeline -u jdbc:hive2://bigdataserver-4:10000
  • As of now, Hive Server is running on bigdataserver-4. We can add bigdataserver-3 using Cloudera Manager.
  • Configuring HA
    • Servers – bigdataserver-3 and bigdataserver-4
    • Proxy Server – bigdataserver-1
    • Proxy Port – 10001
    • Port Number – 10000
    • Update haproxy.cfg file

https://gist.githubusercontent.com/dgadiraju/df639040f8eaf11f4e89dd9ad5cd6607/raw/7530dccdea6c1439aaea99015bd40e330fe5e8b2/hive-haproxy.cfg

  • Restart the service – sudo /usr/sbin/haproxy –f /etc/haproxy/haproxy.cfg
  • Validate by using beeline and connecting to hiveserver using bigdataserver-1 as a proxy.

Configure proxy for Impala

Let us see how we can configure a proxy for Impala.

  • We have seen how to launch Impala Shell ( impala-shell -i bigdataserver-5 ) and run the commands. However, we need to use JDBC to connect to Impala with respect to tools like Tableau so that we can generate reports.
  • As part of the cluster, we get a tool called beeline using which we can validate connecting to Impala Daemon via JDBC (using Hive2 JDBC Driver). beeline -u 'jdbc:hive2://bigdataserver-5:21050/default;auth=noSasl --silent=true'
  • As of now, Impala Daemons are running on all worker nodes. We are hard coding the server IP address while connecting to Impala Daemon.
  • Configuring HA
    • Servers – All Servers on which impalad is running
    • Proxy Server – bigdataserver-1
    • Proxy Ports – 21000 for impala-shell and 21050 for JDBC
    • Port Numbers – 21000 for impala-shell and 21050 for JDBC
    • Update haproxy.cfg file

https://gist.githubusercontent.com/dgadiraju/6e440d160d92e3fb526ea4a18de79839/raw/a7e237d1802ade36769c555ca7b4e190a8f3e4d2/impala-haproxy.cfg

  • Restart the service – sudo /usr/sbin/haproxy –f /etc/haproxy/haproxy.cfg
  • Validate by using impala-shell command and connecting to impala using bigdataserver-1 as proxy.
0 Likes