Setup of local yum repository server – CDH


Do you want to know how to set up local yum repository server so that we can download binaries from with in the network of the enterprise rather than downloading from the internet? First we need to setup local yum repository server.

Steps involved to setup local yum repo server

  • Overview of yum
  • Setup httpd service on one of the servers
  • Cloudera – Local yum repository
  • Copy repo files

Almost all the vendors maintain repository servers and provide .repo file which can be downloaded to /etc/yum.repos.d. Packages will be available and when you try to install using yum, it will download files from the internet and then install. It is not practical to use that approach in enterprises due to security, network bandwidth constraints etc. Hence we need to set up local yum repo server.

Typically most of the Big Data clusters in production will be setup using Red Hat Enterprise Linux and also Cloudera certification exam is conducted on CentOS 7 which is Red Hat flavor and hence we are covering setting up local yum repository. If you have to work on Debian flavors such as Ubuntu or SUSE Linux please refer to official documentation for detailed instructions.

As there is $300 credit, we will be demonstrating on Google Cloud. Make sure to make changes related to AWS while performing tasks on AWS EC2 instances.

Overview of yum

In Red Hat flavor linux such as Red Hat, CentOS, Fedora etc softwares can be installed using yum.

  • There are some standard repositories served by Red Hat or CentOS or Fedora communities.
  • Files with extension of repo under /etc/yum.repos.d act as configuration files related to repositories using which softwares can be setup.
  • yum install command takes care of following tasks on the server
    • Download software
    • Install software
    • Some times start daemon process associated with software
  • Some important commands
    • yum repolist
    • yum list all
    • yum install
    • yum update
    • yum remove
  • We need to update configuration files of underlying software before starting daemon process. Configuration files will be typically available under /etc.
  • In larger enterprises we might have hundreds to even thousands of servers. If we use standard repositories connecting to internet, there can be certain issues.
    • Security
    • Slow as it uses public internet
  • To overcome these issues, enterprises typically have local repository servers from which yum requests such as install and update can be served.
  • Steps to set up local yum repository server with in an organization
    • Setup apache web server or nginx
    • Download repo configuration file
    • Create repo
    • Generate repo configuration files pointing to local yum repository server and then copy configuration files to other servers with in organization.
    • We will see example later.

Setup httpd service

Let us see steps involved in setting up httpd service on the first node as user itversity (on AWS user is centos)

  • We are setting up httpd to setup local yum repository server so that we don’t need to download repositories on to all nodes connecting to Cloudera repositories.
  • If we have local yum repository server, setup will be faster as it will not use public internet.
  • Connect to first server ssh -i ~/.ssh/google_compute_engine itversity@
  • Run sudo yum -y install httpd
  • Enable on start up sudo systemctl enable httpd
  • Start sudo systemctl start httpd
  • Open port number 80
    • Click on more options for the first server
    • Click on View network details , it will take you to VPC network
    • Click on Firewall rules
    • Click on Create a firewall rule
    • Name: webports
    • Change Target to All instances in the network
    • Set Source IP range to
    • Specified protocols and ports: tcp [80, 7180]
  • Go to browser on the host and enter to see HTTP server is up

If you are using AWS make sure HTTP is open in the security group assigned to host.

Cloudera – Local yum repository

Let us see steps involved in setting up local yum repo server for Cloudera Distribution. Make necessary changes to run in other environments such as AWS.

Cloudera Manager

Let us create local repository for cloudera manager. It contains packages related to cloudera manager, agent, j2sdk etc.

Cloudera CDH5

Let us create local repository for CDH5, it contain packages related to HDFS, YARN, Spark etc.

Copy repo files

As we have setup centralized local yum repository server on first node, now we have to copy repo files pointing to local yum repository server on all the nodes in the cluster. We will only setup on first 7 servers as the last one is reserved for the task where we will see the process to add additional servers to the existing cluster.

  • If you have to use scp to copy repo files from bigdataserver-1 to other servers, you need to first run scp and then sudo mv to /etc/yum.repos.d. You should be aware of this approach for certification purpose.
  • In enterprises you need to use DevOps tools such as ansible to perform these type of repetitive tasks.
  • In our working directory /home/itversity/setup_cluster on the host run mkdir -p files/etc/yum.repos.d
  • Make sure we do not have bigdataserver-8 as part of inventory group all in hosts file.
  • Copy repo file contents to files/etc/yum.repos.d files/etc/yum.repos.d/cloudera-manager.repo and files/etc/yum.repos.d/cloudera-cdh5.repo
  • Synchronize repo files on to all the hosts and validate by going to base urls mentioned in repo files – ansible all -i hosts -m synchronize -a "src=files/etc/yum.repos.d dest=/etc" --become --private-key=~/.ssh/google_compute_engine
  • Also you can run yum repolist command using ansible to see new repositories. ansible all -i hosts -a "yum repolist" --become --private-key=~/.ssh/google_compute_engine

By now you should have set up local yum repository on bigdataserver-1, have decent idea about yum, local repositories and also copy repo files pointing to local yum repository server on to all the nodes.

Make sure to shut down servers on GCP or any cloud platform to control the costs and make best use of the credits you get.