Cleared CCA-175 on 15th'Aug, 2021

Hello All,

Happy to share that I have cleared CCA-175 on 15th’Aug, 2021
My deepest gratitude goes to Durga and Itversity, brilliant duo and one stop point for CCA-175 exam preparation.

Glad to share my experience which might be useful to others preparing for the same.

  1. Exam questions were super easy and Durga’s course covers it all.

  2. Exam environment was a bit demotivating to me. I was given a remote desktop (which opens in Chrome). The Caps-Lock didn’t work so I had to manage with Shift+[Key]. The RDC was slow and I got disconnected once during the exam. But no worries, the proctor will pause the timer the moment you get disconnected.

Couple of topics I would like to emphasize on are,

  1. (a) Be familiar with all different types of file formats and compression techniques. You don’t need to learn the internals of these FF and Compressions, but how to write your answers in the destination directory with different FF and compression.
    Whenever in doubt, use the FMOS rule (I figured this myself :wink:) which is as follows:

F = format() || M = mode() || O = option() || S = save() / saveAsTable()

Example - 1: result.write.format("text").mode("overwrite").option("compression", "snappy").save("destination/directory/path")

Example - 2: result.write.format("avro").mode("overwrite").option("compression", "snappy").saveAsTable("DatabaseName.TableName")

All different types of FF and Compression can be used with FMOS rule (Yes, including avro)

  1. (b) Be familiar with How to save an output with a specific_character delimitted text format.

  2. I was asked to save almost 6 out of 9 outputs, in tab(\t) delimited TEXT format (and I used TEXT format, not CSV)
    Example: Say, your output DataFrame/SQL Table has N columns.
    First, convert the table into tab delimited TEXT format:
    result_to_save = output.selectExpr("concat_ws('\t', *) AS result")
    Then, save result_to_save in destination dir:
    result_to_save.write.format("text").mode("....").option("....").save(".....")

  3. I didn’t include --packages while launching pyspark shell, to work with avro format. It’s already integrated in the environment. I used only pyspark command to launch the shell, with no other parameters, still I could finish my test in 90-95 minutes (One revision included), probably because I launched pyspark shell only once and answered all the questions. Remember, every time you are launching pyspark shell, it’s consuming time. (Solely my opinion, you may choose a different approach)

  4. You are allowed to open more than one terminal, to browse the HDFS. A text editor will be provided in the remote desktop.

That’s about it !! Let me know in the reply section, should you have any other queries and I would be happy to answer them to the best of my knowledge.


Prepare for certifications on our state of the art labs which have Hadoop, Spark, Kafka, Hive and other Big Data technologies

  • Click here for signing up for our state of the art 13 node Hadoop and Spark Cluster

1 Like

Hi,
Thanks for sharing this information.
How many questions you got in Exam?,
Which copy Paste option worked - CTL+C or CTL+V
Or CTL+SHIFT+C or CTL+SHIFT+V
Could you please advise.

Thanks
Sam

Hi @samadhanspatil, I got 9 questions in the exam and used Ctrl + Shift + C/V

Congrats Buddy.

Could you please suggest the best course to learn Big Data using Python3