Passed CCA 175 on 17/03/2020

Hi Itversity associates,

I am here to thank the people behind the Itversity who are helped me to clear the exam in the first attempt. I would like to say special thanks to Durga sir for his effort in bringing the right platform for the cloudera certification seekers.

1 Like

Congrats sudheer for clearing the exam, I would like to know few things about the exam for myself and for certification aspirants ,
1.What is the difficulty level of the exam, is it simple, advanced or complex ?
2. Was there enough time to solve the problems ?

Congratulations Sudheer !
Your information on the exam is very useful. Thanks for sharing.
I have one question if you have used pyspark for solving the problems.
Did you have to start pyspark with avro packages (–package com.databricks…etc) to work with avro files (to read or write) ?

You are not supposed to save some of the information mentioned in the post and hence I am deleting the potential sensitive information.

Sqoop is out of the scope.




  1. Did you use --master YARN or LOCAL?
  2. For avro etc do we include any special paramaters?
  3. Were there any Ranking or Top N questions?
  4. Sqoop, Flume, Impala is out I hope and any Hive questions?


yes i used --master yarn
no special parameters used for avro only the package

yes we need to include the avro packages while initiating the spark. always include the avro package at the initialization

Hi Sharma, i felt the exam is simple and easy. i am sure that the candidates can able to pass easily if they have covered the topics covered in udemy sessions in addition to any one practice test course also available in udemy.

Thanks Sudheer.

But how were you able to manage the performance, I mean the number of executors vs cores and memory, were there huge data files in the exam.
Also do we get any editor like sublime etc. to reuse code ?

Thank you Sudheer. What version of avro packages did you include.

Hi Sudheer
Could you please tell us about the Examination Environment like in which type of machine you used for this exam and what was your internet connection,how you manage to sit for continuous 2 hrs,what was your back up plan in case of power cut happen,where you sit for this exam in home or office?


Hi bharat, don’t worry about the executors or cores. i don’t know exactly where there is an option for subline text editor or not. you will get some idea about the screen if you visit the on demand video placed on the cca175 official website.

Hi Mukherjee, the examination environment for the CCA175 is definitely an unique experience as we need to sit for the exam in our own house or office. But be cautious about the following issues.

  1. The remote desktop process is a bit confusing and time consuming, therefore connect to the proctor 15 minutes prior to the schedule time.

  2. you need to communicate through text only, therefore please read the instructions carefully

  3. you need to prepare for the exam environment for 3 to 4 days prior to the exam delivery, this ensures that you can able to sit for the exam for 2 hours without any discomfort. if possible attempt the practice exams exactly at the same time you will be taking the exam (for example if you are taking the exam from 10 to 12, then attempt the practice sessions specifically in this time slot)

  4. Always maintain a backup for both the power and internet at the exam environment. i got my internet disconnected two times, but was able to reconnect because of internet backup.

  5. In case your primary internet source was disconnected, then connect to the secondary source and continue with that until the completion of task.

  6. it is better if you have a mobile hotspot (e.g. JioFi 4G Hotspot M2S 150 Mbps Jio 4G Portable Wi-Fi Data Device) kind of source as internet backup. this will work on charging and therefore not affected by powercut.

  7. be cautious that the internet disconnection means that all the process in running will be lost. you need to initiate the pyspark again with the packages and also need to import all the sql functions

Hi Sudheer / ITversity Team,

I have download Cloudera Quickstart VM last year and practiced on that.
That quickstart VM has Spark 1.6, as per new guideline, spark2.4 is given in exam.

Where you guys have practiced Spark 2.4 ? Did you guys managed to download latest version of Cloudera quickstart VM which has spark 2.4 or practiced somewhere else ?


First congratulations,

I have 2 questions:
how many questions do you have during the exam?

do you have any questions about spark-submit?

Hi Sudheer, congratulations on completing your exam!! Could you please clarify the following?:

  1. You mentioned that we need to initialize spark session with avro package details. Where can we find the details on the avro package that we need to use when initializing? Will this be provided to us or we need to find it out from somewhere during the exam?
  2. Were there any HIVE related questions? If so of what kind?

Hi Sudheer,

Happy for you clearing CCA 175 exam. Could you please let me know what is the package name you added to work around with avro files in exam environment.

Hello All,
I have appeared for CCA175 recently and cleared the exam.
Many thanks to Durga and this community for providing great insights and holding up the interest in pursuing this certification.
I have learned a lot from this community and in return I would like to give back my learning notes, problem exercises and solutions in
The exam was quite okay and I wish everyone to clear this exam with flying colors.


@PrakashP - Congratulations!! And the materials you provided in github are fantastic. Really appreciate it. This will help a lot. Also, could you please mention “where/how” did you find(should we memorize it?) the avro package to use during your exam? I did try the following in Itversity labs and not working:

pyspark2 --master yarn
–conf spark.ui.port=0
–packages org.apache.spark:spark-avro_2.11:2.4.0

I found the above pavro ackage details here and used it for initiating it as above:

x =“some data”)
x.write.format(“com.databricks.spark.avro”).save(“some path”)

Error is:
pyspark.sql.utils.AnalysisException: u’Failed to find data source: com.databricks.spark.avro. Please find an Avro package at;

I am confused if this is an issue with the lab setup that i am using or this wont work in exam also. Hope you throw some light on this. Thanks in advance!!

Can you share the links?