CCA175 certification updated syllabus preparation guide

Hello Durga,

Hope you are doing good and thank you very much for creating amazing playlists for Hadoop.

I would like to appear CCA175 certification. But Cloudera updated the syllabus on its website. Could you please let me know when the new syllabus is going to live in the exam?

If not, could you please let me know any specific playlist you created for new syllabus which we can prepare for certification?

Thanks,
Ravi Teja

Hi @ramravi92,

You can use this playlist for your preparation

https://www.youtube.com/playlist?list=PLf0swTFhTI8q0x0V1E6We5zBQ9UazHFY0

@Abhishek,

Is this 23 videos sufficient to clear the CCA175 certification with new syllabus ?

Thank you Abhishek.

In the new syllabus hive is not included. This playlist is covering Hive as well.
Is this playlist enough for new pattern of Exam.

Hi @ramravi92,

Although Hive is not covered the point mentioned below is with respect to Hive Context.So, you need to be comfortable for Spark SQL with HIVE Context

Data Analysis
Use Spark SQL to interact with the metastore programmatically in your
** applications. Generate reports by using queries against loaded data.**

Use metastore tables as an input source or an output sink for Spark applications

Also, the playlist mentioned covers all topics with respect to certification

1 Like

Hi @Abhishek

Is Scala Collections also a part of CCA175 certification?
I could see that we have 4-5 detailed videos on Scala, should i go through all of them ?
e.g. -

How about other videos also – Big Data Workshop - 10 - Spark aggregateByKey and groupByKey

I have purchased big data lab, do i still need to install Scala individually (https://www.scala-lang.org/download/install.html) to run the classes and objects as mentioned in these videos as i am getting error while executing them in scala in the lab

Thanks,
Rittika Jindal

Hi Rittika,

You should be comfortable with either Scala or Python collections and use it in tandem with Spark Transformations and Actions.So,it is suggested to setup Scala on local system.What is the error you get on lab?I don’t think there is an issue in the lab

Thanks,

Abhishek

Hi @Abhishek

This is the error i get while running Scala on the lab.

scala> case class hw(var i: Int, var j: Int)
defined class hw

scala> :javap -p hw
Failed: Could not load javap tool. Check that JAVA_HOME is correct.

Thanks,
Rittika

Hi @Rittika_Jindal,

It will not hamper your Scala programming practise.You can continue to write programs
However,to use javap you can use your local system

Thanks,
Abhishek

1 Like

Thanks for quick reply @Abhishek

Will continue video tutorials accordingly!

Hi

As per the certification guidelines, its said that, for the exam CDH version would be CDH 5.10

The initial playlist which explains of sqoop is based on CDH 5.5. Is there any major difference between the versions.

1 Like

Hi @itversity, Thanks for providing the new and revised playlist for the updated syllabus for CCA 175 exam.

In the intro video the syllabus was discussed for CCA 175, but it was still the same as old syllabus. Please see the screenshots below:



Whereas this is the new syllabus as present on the Cloudera Website (https://www.cloudera.com/more/training/certification/cca-spark.html):

Data Ingest

  • The skills to transfer data between external systems and your cluster. This includes the following:
  • Import data from a MySQL database into HDFS using Sqoop
  • Export data to a MySQL database from HDFS using Sqoop
  • Change the delimiter and file format of data during import using Sqoop
  • Ingest real-time and near-real-time streaming data into HDFS
  • Process streaming data as it is loaded onto the cluster
  • Load data into and out of HDFS using the Hadoop File System commands

Transform, Stage, and Store

  • Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.
  • Load RDD data from HDFS for use in Spark applications
  • Write the results from an RDD back into HDFS using Spark
  • Read and write files in a variety of file formats
  • Perform standard extract, transform, load (ETL) processes on data

Data Analysis

  • Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.
  • Use metastore tables as an input source or an output sink for Spark applications
  • Understand the fundamentals of querying datasets in Spark
  • Filter data using Spark
  • Write queries that calculate aggregate statistics
  • Join disparate datasets using Spark
  • Produce ranked or sorted data

Configuration
This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.

  • Supply command-line options to change your application configuration, such as increasing available memory

Please suggest whether to continue with this playlist for this new syllabus or is there another playlist

Thanks and Regards,
Udit Arora.