Cleared CCA 175 -11/12/2017



I cleared CCA 175 exam today. I want to express my sincere gratitude to ITVersity. Thank you Durga and Arun for creating such wonderful content. Thank you ITVersity for giving me the platform to learn big data related technologies for free!!!
My suggestion to test taker

  1. Please do watch Cloudera provided sample exam video before exam and be familiar on how to zoom in .
  2. Memorize File Format in Spark (
  3. Write your code in a Text editor (I accidentally closed spark-shell window couple of times). Re-running the code was easier for me.
  4. Do not give exam from your office (I tired). Office firewall blocked remote URL and examiner asked me to go home.
  5. Have a laptop that does not have anti-virus. I was getting disconnected every minute and my examiner told me to disable anti-virus.
  6. It is very difficult to refer to document during exam. It is better to practice more and memorize most of the commands.
  7. Do not worry about cluster size or shuffle partition settings. I observed that cluster performance was fast and I did execute without much delay.

Good luck.

#2 does not involve in shuffling all the time. So it is not parameter to confirm the performance of the cluster.

Congratulations @dippradhan and thank you for providing some valuable inputs.


@itversity thanks, each time I am learning something new from you:-)


Congratulations @dippradhan!

I am planning to give certification test soon and I have a question… Do we need to have knowledge on both Python and Scala to solve Spark scenarios? I am reading that we need to fill the templates both in Python and Scala. Is that true to know both the languages?


Congratulations @dippradhan,

Can you please tell how do we have to write code and submit?
Like either

  1. We have to create .scala file for our code create its jar and run that using spark-submit and save o/p in hdfs/file system
  2. We have to directly run our code on spark-shell, save the output in hdfs/file system. That’s it.
  3. We have to directly run our code on spark-shell, save the output in hdfs/file system and also save the written code in a file for submission?

What will they check, they need both correct o/p data in filesystem and code or just correct o/p data in file system?

Mohit Jain


Answer is #2. Let me give an example of a sample problem(this is not an exact problem from the exam).
You have 20 million transaction at HDFS location /user/cloudera/problem1 in text format. You have to create a result set where product price > 100 and save the file in JSON at Location /user/cloudera/resut1.

–They just check the output file. It does not matter if you use scala/python/data frame/RDD/SQL. As you can understand you do not need to develop full scale program to achieve this.


You do not need to have knowledge of both Python and Scala. I have not seen any template in a specific language.


Thanks @dippradhan, it answered many questions.
One more question, sometimes after running we get .part-00000.crc or ._SUCCESS.crc, ._metadata.crc etc. files in o/p directory containing data which is not part of our output. So we need to delete these files also?

Sometimes _SUCCESS files which are blank created, do we need to delete these also?

Mohit Jain


@mohitjain012 you do not need to delete these files.


Congratulations!!! Is there any one compiled book / pdf / blog to go through all the topics at one place for this exam CCA175?



Please go through this blog.


Congratulations @dippradhan :+1:

Any Questions on Kafka / Flume / Streaming ?


There was no question on Kafla/Flume for me