Tips: CCA Spark and Hadoop Developer examination

Let us see important tips while taking the certification exam - CCA Spark and Hadoop Developer. These tips are purely based on CCA 175 official link -

As per the link and FAQs, you will given multinode cluster and might have to process huge chunk of data. Based on that

  • Understand the size of the cluster using cloudera manager, namenode web interface or resource manager web interface. You will be provided with the details either as bookmarks in the browser or ip address on which each of these services running. Here are the default port numbers
  • Cloudera Manager 7180
  • namenode 50070
  • resource manager 8088
  • If you do not see any details about cloudera manager, most likely it might be running on gateway node
  • Understand volume of data you need to process
  • Launch spark-shell or pyspark in yarn mode with num-executors - spark-shell --master yarn --num-executors NUMBER_OF_EXECUTORS
  • If you are using spark sql, make sure to set spark.sql.shuffle.partitions to more accurate value. Default values is 200 and it can waste some of your valuable time
  • Use numTasks to reduce number of tasks after shuffling. It is available for transformations such as join, reduceByKey, aggregateByKey etc. Consider this with care.
  • In sqoop use --num-mapper or -m to increase number of mappers while importing or exporting the data. In case of import run sqoop eval command first and get number of records you need to import.
  • sqoop eval also will confirm whether your JDBC URL and credentials are correct. It also confirms that you are able to access the table
  • As per the video in the above section mysql will be running on gateway. If the port number is not specified then it will be 3306
  • Each of these problems can be solved in multiple ways - use the approach you are comfortable with
  • Make sure you double check your results as the test is evaluated purely based on results instead of code.

Disclaimer: These tips are purely based on curriculum published and other information provided on official CCA 175 page.



I am planning to give CCA 175 . Could you please clarify my doubts below-

-Do we get template of .py or .scala code to fill the blanks in ?
-Do we need to know both Python and Scala before attempting CCA 175 or familiarity with either one of the language is enough?
-Do we feel the need to use any IDE or running code snippets on spark-shell to get desired output is enough for the difficulty level of questions in exam?
-Do we need to compile our code to create jar during exam?