CCA175 Cleared on 01/13/2018 Score 8/9



Cleared my CCA175 exam today,Got a score of 8/9

The exam is very very easy. This was my preparation plan,You might not need everything but 3 and 4 are must

  1. Hadoop definitive guide 4th Edition by tom white - Will give you solid Information on Hadoop, Sqoop, hive (Do this if you want to increase you knowledge in general not just for certification,on a high level for certification purpose durga covers these topics in his playlist )

  2. Learning Spark by Holden karau - Again as state above this is not required but just for In depth on spark chapters 3, 4, 5, 6, 9

  3. Itversity free youtube playlist

Here are the Udemy coupons for our certification courses

  • Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Python.
  • Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Scala.
  • Click here for $25 coupon for HDPCD:Spark using Python.
  • Click here for $25 coupon for HDPCD:Spark using Scala.

  1. Aruns questions on his blog Do this only after you have finished your preperation.This will help you gauge yourself on where you are.the difficulty level for me personally on this blog i felt to be 7/10 and in the exam you would get a difficulty level of 4/10

Tips for taking the Exam

  • Practice, Practice, Practice
  • Dont just read through solutions go ahead and Execute multiple times on the VM
  • File formats and compressions read and write are most important, because it doesnt matter how good you are at transformations, if you are not able to present your output the way it was asked you are not going to get credit for that question
  • The test VM is very low resolution, so make good use of your time

Questions are mostly on Sqoop and Spark
Import and Export are straight forward but look out for columns being requested, delimiters and file formats
Out of the Spark questions - I was expecting to get all right but missed on one, and i still don’t know what went wrong, Most of them involve basic RDD oeprations and then reduceByKey, Join, groupByKey very basic and most importantly fileformats and compression as usual

I have known hadoop for 4 years.But kept pushing back on certification due to several reason, I am glad i was finally able to pull this off. But to crack this certification if you are newbie with some programming back ground 2 months should be good enough to prepare

I owe a lot of gratitude to discuss.itversity community, itversity team, durga and arun. What you guys are doing is phenomenal. Thank you, your help is very much appreciated

Feel free to ask any question you might have


Hi Venkat (@vm109)

Congratulations on clearing your test.

May I know if you have used Python or Scala for clearing your Spark Questions. I know the test doesn’t look at the code we write. It verifies whether files are in right formats in output dir.

If you have used python, how did you deal with avro files? I mean did you start pyspark with databricks imported?

Rajesh K


Congratulations @vm109

Can you let me know if file formats are provided in the test or do we need to know the codec by heart?


Hi @Varun_Upadhyay1

Though some people prefer to memorize it - in my view it is easy to refer to the file /etc/hadoop/conf/core-site.xml in which we can find the compression codecs.



Congratulations @vm109, fixed few formatting issues and rephrased some of the information. We are not supposed to pass information about number of questions and breakdown as per Cloudera guidelines.


Congratulations @vm109


Congrats vm109!!

can anyone give me the procedure to save text file with Snappy compression. It is failing when I used the below procedure:
“ => rec(0)+”,"+rec(1)+","+rec(2)).saveAsTextFile("/user/cloudera/crimeresults.csv", classOf[])".

Thanks in advance!!


I have used scala. They just specify the compression format required


your code worked for me.Might be a difference in versions,Not Sure

The below is from arun’s blog

Below may fail in some cloudera VMS. If the spark command fails use the sqoop command to accomplish the problem. Remember you need to get out to spark shell to run the sqoop command.> x(0)+"\t"+x(1)+"\t"+x(2)+"\t"+x(3)).saveAsTextFile("/user/cloudera/problem5/text-snappy-compress",classOf[]);

sqoop import --table orders --connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” --username retail_dba --password cloudera --as-textfile -m1 --target-dir user/cloudera/problem5/text-snappy-compress --compress --compression-codec


Hi vm109

I have a couple of queries related to certification. Cannnot post here due to the compliance issue.
Could you please share your email id or drop me test email ( ).



Thanks for your response!!

Arun’s sqoop solution can be used if we are directly importing the data from mysql database. But, if there is a dataframe that needs to be saved as text file with Snappy compression, we can’t use sqoop solution.


Hi vm109,
Which language have to use (python/scala) or both or we have to choose or they’ll mention to solve in CCA 175 for solving sparks questions ??
Can you please help me as i am new to the Hadoop and spark.??

Thanks in advanced.


Try this. :blush:",")). saveAsTextFile("/user/cloudera/crimeresults.csv", classOf[]);


Hi is the spark version the older one or the new one? I mean since the syntax are quite different for the newer versions


working in spark 1.6.x. Not sure about older versions.


You can use the language you prefer

All they evaluate is the result


Please check cloudera website for versions as i do not recollect the exact version on test


Hi,I have a question : do you need to use repartition(1) to save the answer as a single file during the exam? Thanks very much!


Hi Venkat,

I have few queries regarding certification. If you could please share your email id to, would be much appreciated.

Thanks in advance


One question, can someone help me who attempted exam:-
Does below command be enough to launch multiple spark-shell window:
spark-shell --master yarn

If not what command should I use for CCA 175 exam VM.