Anyone willing to crack Cloudera Administrator (CCA-500) certification in 4 weeks?

With proper planning and using correct materials, I think we can pass (and score well) Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam. If anyone is interested in joining me, please give me a buzz.

I am in @KiranGudipudi @ramtpafl . Do you have any plan? I am willing to join with you.

I am in @ramtpafl
Any Plan and how do we go ahead!!!

Here are my unfiltered thoughts on this topic, please let me know if it makes sense.

We have only 4 weeks, so our mission statement should be: I will be a Cloudera Certified Admin on 01 March 2017. Period.

Now let’s take a look at our options on making that happen.
1. Training from Cloudera

Generally, vendor trainings are designed towards making participants pass the exams with “some” ease. It’s not cheap ( US $3K or close to ~INR 2laks) but if you are fortunate, you can ask your employer to send you to one. Let’s keep this option aside for now.

2.Nobody can beat the real experience. We are serious aspirants but people who already cleared the CCAH-500 can share their insights on what they did for their exam preparation so that we all can get benefited.
If 5 people share at least two tips on their preparation, approach or resources they used then we already have 10 tips at our disposal. We need their insights but not the actual questions or content from the exam.

Here we need to bring the The law of sacrifice into picture:

“In order to attain something you believe is of greater value, you must give up something you believe is of lesser value.”

We should be willing to be uncomfortable at least for few weeks. As we all are doing our regular jobs, we would have only few hours left on week days and may be few more on weekends. That brings us to this equation…

(20 week days * 2h) + (8 weekend days * 6h) = 88h

(I live in US so I am using 5 day work week).

Every individual is different so we can adjust few hours here and there but exam date is fixed. Book the exam date.

with so many resources available (like good books, online training videos, Durga’s playlists, Vendor presentations from projects at work etc) its easy to get confused and I am dealing with that now. In one of the playlists, Durga mentioned this.

DO NOT OVER PREPARE but just focus on the exam objectives and we should be good to clear any exam.

Here is my plan:

  1. Understand the exam objectives and use the preparation tools accordingly.
     HDFS (17% - 10 questions)
     YARN (17% -10 questions)
     Hadoop Cluster Planning (16% - 10 questions)
     Hadoop Cluster Installation and Administration (25% - 15 questions)
     Resource Management (10% -6 questions)
     Monitoring and Logging (15% - 9 questions)

Here are my resources:

  1. Cloudera Admin Handbook by Rohit Menon
  2. Durga’s CCAH playlist:
    Durga already covered CCAH topics in a series of 52 videos I believe. My plan is to go thru them at least 3 times (grandma’s advice to understand things better).
  3. ITV labs ( to get familiar with Config files and Parameters
  4. Hadoop Definitive guide by Tom White
  5. Other books and training materials.
  6. Experience gained from work.

With that said, I am burning my prep time mostly on first three items and we can allocate those 88 hours and see how it works. You can share your thoughts on this too.

In the end, this is what I believe.

“People influence people. Nothing influences people more than a recommendation from a trusted friend.” - Mark Zuckerberg, CEO, Facebook.

This individual (Durga) did such a great job of creating ITV platform and every day we see that people are getting benefited. We always need more…I am eagerly waiting for the CCAH Admin Labs Durga mentioned earlier so that we can play with HA (High Availability and Security of Hadoop clusters).

Thank you.

Guys, any comments on the plan/approach?

I have also started preparation for CCAH like a week ago. I’m following Durga’s playlists at the moment.
Regarding 4 weeks time, in my personal opinion, is a bit tightly calculated especially if we are having a job in parallel. I suggest to extend the timeline and make sure to be well prepared for all sections of CCAH and only then try to attempt the exam. The more time we spend on course content, the more understanding will be better and probability will increase to get through the exam in first attempt.

Ram - I’m interested too…In addition to resources you have mentioned I’ve heard about people using Hadoop Operations from O’Rielley

I’m happy to share that I passed my Cloudera Certified Administrator for Apache Hadoop (CCAH) exam (CDH-5) recently with good score. Let me say few words about my approach and preparation for the exam so that others can benefit.

Let’s start with good things first.

My original plan was to finish this exam by first week of March. Due to busy work schedules and other priorities, the exam got pushed a bit but not the goal. I worked with couple of people who are serious about learning and that really helped me to stay focused.

Next… I wouldn’t term below items as bad things but I call them “challenges”

  1. I had all the resources (books, videos etc) but had trouble with direction/approach in the beginning.

  2. Ignored by people who said they will guide but then showed their back.

I should say #2 always makes you feel bad and at the same time push you to the boundaries. I spent lot of extra time figuring out the direction by myself and that really helped me in the end.

Last but not least, I should thank the following people/resources from my heart.

Durga - For his direction, motivation (and of course 100s of his videos)
Sam Alapati, Author for Expert Hadoop Administration (excellent DBA handbook)
Definitive Guide - For Hadoop fundamentals
Few friends - For sharing their valuable time.

Thanks again and happy learning.

Hello Ram,

Congratulations on your certification.
I have started my preparation two weeks back and would like to discuss the prep methodology with you. Can you let me know your contact details?


1 Like

Yup,I got the same info…
Any comments @itversity

Any specific topics from Hadoop Operaions or we need to refer whole book?

Thank you. By reading this post from the top, you should have got a pretty good understanding by now on the path i followed. Once again, here are my primary resources.

  1. Durga’s videos
  2. Sam Alapati, Author for Expert Hadoop Administration (excellent DBA handbook)
  3. Definitive Guide - For Hadoop fundamentals
  4. Hadoop Operations

With so many resources at hand, its easy to get distracted or burdened. Take the exam objectives and map them to the chapters in those books.

Always write your own notes.

If you already have real time experience, please explore all GUIs from Cloudera Manager and admin commands you use to operate your clusters on a daily basis.

If you are new to Hadoop world, you can explore the same by creating your own Hadoop cluster either on VMs or AWS (Durga’s has created some nice videos for both topics).

Hope this helps.

I’ve looked at the list of contents and they all appear very relevant for certification.

Table of Contents

Hadoop Operations


Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us

  1. Introduction

  2. HDFS
    Goals and Motivation
    Reading and Writing Data
    The Read Path
    The Write Path
    Managing Filesystem Metadata
    Namenode High Availability
    Namenode Federation
    Access and Integration
    Command-Line Tools
    REST Support

  3. MapReduce
    The Stages of MapReduce
    Introducing Hadoop MapReduce
    When It All Goes Wrong
    Child task failures
    Tasktracker/worker node failures
    Jobtracker failures
    HDFS failures

  4. Planning a Hadoop Cluster
    Picking a Distribution and Version of Hadoop
    Apache Hadoop
    Cloudera’s Distribution Including Apache Hadoop
    Versions and Features
    What Should I Use?
    Hardware Selection
    Master Hardware Selection
    Namenode considerations
    Secondary namenode hardware
    Jobtracker hardware
    Worker Hardware Selection
    Cluster Sizing
    Blades, SANs, and Virtualization
    Operating System Selection and Preparation
    Deployment Layout
    Hostnames, DNS, and Identification
    Users, Groups, and Privileges
    Kernel Tuning
    Disk Configuration
    Choosing a Filesystem
    Mount Options
    Network Design
    Network Usage in Hadoop: A Review
    1 Gb versus 10 Gb Networks
    Typical Network Topologies
    Traditional tree
    Spine fabric

  5. Installation and Configuration
    Installing Hadoop
    Apache Hadoop
    Tarball installation
    Package installation
    Configuration: An Overview
    The Hadoop XML Configuration Files
    Environment Variables and Shell Scripts
    Logging Configuration
    Identification and Location
    Optimization and Tuning
    Formatting the Namenode
    Creating a /tmp Directory
    Namenode High Availability
    Fencing Options
    Basic Configuration
    Automatic Failover Configuration
    Initialzing ZooKeeper State
    Format and Bootstrap the Namenodes
    Namenode Federation
    Identification and Location
    Optimization and Tuning
    Rack Topology

  6. Identity, Authentication, and Authorization
    Kerberos and Hadoop
    Kerberos: A Refresher
    Kerberos Support in Hadoop
    Configuring Hadoop security
    Other Tools and Systems
    Apache Hive
    Apache HBase
    Apache Oozie
    Apache Sqoop
    Apache Flume
    Apache ZooKeeper
    Apache Pig, Cascading, and Crunch
    Tying It Together

  7. Resource Management
    What Is Resource Management?
    HDFS Quotas
    MapReduce Schedulers
    The FIFO Scheduler
    The Fair Scheduler
    The Capacity Scheduler
    The Future

  8. Cluster Maintenance
    Managing Hadoop Processes
    Starting and Stopping Processes with Init Scripts
    Starting and Stopping Processes Manually
    HDFS Maintenance Tasks
    Adding a Datanode
    Decommissioning a Datanode
    Checking Filesystem Integrity with fsck
    Balancing HDFS Block Data
    Dealing with a Failed Disk
    MapReduce Maintenance Tasks
    Adding a Tasktracker
    Decommissioning a Tasktracker
    Killing a MapReduce Job
    Killing a MapReduce Task
    Dealing with a Blacklisted Tasktracker

  9. Troubleshooting
    Differential Diagnosis Applied to Systems
    Common Failures and Problems
    Humans (You)
    Hardware Failure
    Resource Exhaustion
    Host Identification and Naming
    Network Partitions
    “Is the Computer Plugged In?”
    Treatment and Care
    War Stories
    A Mystery Bottleneck
    There’s No Place Like

  10. Monitoring
    An Overview
    Hadoop Metrics
    Apache Hadoop 0.20.0 and CDH3 (metrics1)
    JMX Support
    REST Interface
    Using the metrics servlet
    Using the JMX JSON servlet
    Apache Hadoop 0.20.203 and Later, and CDH4 (metrics2)
    What about SNMP?
    Health Monitoring
    Host-Level Checks
    All Hadoop Processes
    HDFS Checks
    MapReduce Checks

  11. Backup and Recovery
    Data Backup
    Distributed Copy (distcp)
    Parallel Data Ingestion
    Namenode Metadata

1 Like

Durga has done a good job. All his videos are very useful.

1 Like

Great plan. I am in.

@jhussain, thanks for sharing those details. Indeed the Ops book IS covering lot of certification objectives. Are we too aggressive with our 4 week plan or is it still doable?

If you are new to Hadoop, 4 weeks with a full-time job is aggressive. I have set March 1st as my target though.

Sounds good. So, are these our main source of knowledge to meet the course objectives?

  1. Hadoop Operations/Cloudera Admin Handbook
  2. Durga’s CCAH playlist
  3. ITV labs
  4. Any other books/training materials.

Yes…I’ve downloaded Cloudera’s VM image for practice. My plan is to finish reading the books in the next few days then watch Durga’s videos and do the labs. Perhaps repeat the process again and then some practice exam.

BTW the Hadoop Operations book is older and may not cover Hadoop 2.0, though we may want to just glance over some of the topics. So i’d say lets stick with Definitive Guide and Cloudera Admin Handbook for now.