Cleared CCA 175 on 14th Jan 2018 - Passed all the questions



I would like to sincerely thanks Durga and Itversity team. Finally cleared CCA 175 passing all the questions. I would have never got the chance to go for this certification without proper Step by step learning provided by Durga sir and the team.
Big data labs has proved to be a boon as it allowed me to practice in real cluster environment and ultimately, cleared the certification.

Here are the Udemy coupons for our certification courses

  • Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Python.
  • Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Scala.
  • Click here for $25 coupon for HDPCD:Spark using Python.
  • Click here for $25 coupon for HDPCD:Spark using Scala.
  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


Congrats @harshilchhajed12… Any tips for newbies like me??


Congratulations @harshilchhajed12!

May I know how you dealt with avro files in your test. Did you use pyspark or spark-shell.

In either case, how did you manage to get databricks package in order to deal with AVRO files?


Congratulations !!

could someone please answer below question:

I took one textFile, and performed few transformations in RDD , converted to DF and saved as JSOn. and i got below o/p

{"_1":7,"_2":“Fan Shop”}

now, the same scenario I executed using dataframes (by creating case class etc…)
o/p is

{“department_id”:7,“department_name”:“Fan Shop”}

so whenever we convert rdd to DF , we miss column names, so in the first case , result is considered as correct ?
If not what is the solution ?


@varsha, both look good to me. But the 2nd one is more readable. it looks like you didn’t specify the column names when you converted RDD to DF. That’s why you don’t have column names instead you got _1 and _2.


Hi @varsha

In first case, while converting RDD to DF, you need to name the columns. For e.g. Using Python below:

from pyspark import Row
RddToDF = a: Row(department_id = a[0], department_name = a[1] )).toDF()

and then you can save this DF in json format: RddToDF .write.json(“FILE_PATH”)


Hi @harshilchhajed12
congrats !!!

How many different type of file formats you faced in the exam ?



If you are asking exam point of view, only your second case is correct not first one,because your solution missing schema.


Few questions -
#How long did you prepare for this exam ?
#Was the test env/shell different from what you get accustomed in labs shell env ?
#Did you have enough time to finish the exam or you rushed it ?

Any additional tips towards the exam certification would help greatly. Thank You.


yes from exam point of view . thank you.


yes, I was not aware that we can mention column names. thank you


yea , just got answer Tq


Hi @harshilchhajed12

were there any questions based on SequencFile ?


There were question based on all the file formats including Sequence file


I have used spark-shell. The choice is yours on which language you want to solve the question. Yes there were question on avro file format too.


Follow the Udemy course of Durga sir, CCA 175 using Scala. That is what you need the max to understand the concept. Try to practise as much as possible.


It depends on how fast your understanding about the concepts are. Yes shells are very much similar to what we have in labs. Some questions are very time consuming and have some missing information which we have to decide while approaching the problem. I found some question to be ambiguous and did not contain all information, which took more time to solve the problem.


congralutation. can you provide site for Certification dump to partice before taking CCA exam.


Congrats !!!
Udemy course of Durga sir, CCA 175 using Scala is cover all the topics for certification exam?
Also is there any practice exam available?




I need to ask one thing.Did you get any questions from Kafka or Flume?