Cleared CCA 175 With 100% Score ! May 8th 2019


Hello Sofia,

Congratulations !!!

About spark questions , were all of them related to file formats or some of them were related to data analysis like top 10 rank or something.

Please suggest.




Hi Ashutosh -
Thanks! Most of the Spark questions were a combination of joining of input tables, transformation of column data (like changing the string values) and storing the joined data in a certain format. There were some aggregations but didn’t ask for top N or rank specifically. There were some max min type though.

Hope that helps.



Hello Sofia!!
Have few doubts

  1. have practiced in itversity lab which is from hortonworks!!..can i go for CCA175…Will there be any difference?
  2. Did u get any spark streaming questions??or any idea about previous batches facing such questions?
    3)Does the question specificaly ask you to code you in Spark SQL/RDD?And if the question is in pyspark ,can we answer the same in spark scala to produce the desired output?


Hello @Sofia

Firstly Congrats on your achievement!
Can you please let me know how much time it took you before you felt confident taking the exams.
I have no prior experience on Spark. How many hours per week would it take approximately and How many months would it take to clear CCA 175 approximately?

Lastly Thanks for answering our questions



Hi @priyam_mohapatra - Thank you! Regarding your questions, here are my inputs :

  1. There shouldn’t be a lot of difference if you use the same version of Spark that you practiced with. I think you should be fine. At the end of the day, Hive, MySql, Sqoop and Spark commands are all that matters irrespective of the platform (CDH or HDP) as both will support all the standard versions of the these technologies. For example Spark 1.6 differs in many commands (and built in functions) from Spark 2.0+ but Spark 1.6 will behave the same irrespective of whether its installed on CDH or HDP. The differences will be where you can find the core-site.xml or hive-site.xml etc conf files in case you need them during the exam in HDP vs. CDH distributions. However, don’t panic and try to look for the regular places or grep them if you need them in the exam. I didn’t have a need to look at the conf files during the exam.

  2. No spark streaming questions. I haven’t heard of anyone getting spark streaming questions - i guess its hard to validate or test for them.

  3. Questions don’t ask you to code in Spark SQL or RDD - its totally up to you what you want to use. Questions are not given in pyspark or any language-specific and no code snippets. I haven’t heard of anyone getting any language specific code snippets ever. You can use whatever you want to arrive at the result. They only care about the output files, not code. You can use spark scala - I used only that.



@Sandeep_Shenoy1 Hi Sandeep - Thanks a lot!

Good question! About 5-6 weeks should be enough with 2-3 hrs per day but more in the weekends (I was only doing during the weekends). Actually time management is the bigger issue and so I was practicing quite a lot in the weekends to get to that speed of finishing in less than 2 hours.

But looking back, I would say here is my recipe to ace the exam in the first attempt (At some point I will write a detailed one - I have been planning to. This is just the overview. Again you are the best judge of your study pattern, so feel free to decide what best works for you!)

  1. I wasted a lot of time during preparation doing RDDs which are complex and really the same thing can be achieved via Spark SQL quite simply (so atleast 2 full weekends went in this and I never used this in the exam). Moral - Stick to Spark-SQL. Exam doesn’t care what you use. Just know some RDD/DF operations as a back up solution.

  2. It would take 6 full weekends to be 100% confident with time management (and some buffer for maybe some evenings). Moral : Practice is the key. I had practiced so much that in the exam I could have written the code blind :D. I had a day job so I spent about 8-9 hours per weekend day on weekends.

  3. Know the compression classes by heart (for both Sqoop import compressions and Spark saving of dataframes/RDDs). There is no time to go search for documentation.

  4. Read the questions carefully because it may not be clear at the first go what they are asking for

  5. Remain calm - they are not looking for Spark geniuses. Questions are not difficult.



Hi Sofia,

Thanks for your time and I really appreciate all your input here. Your inputs are valuable. Quick on practicals - I am looking to do more practice but I am wondering where can I get good handful of questions to solve. Could you shed some light on practical questions. Thanks again!



Hi Swetha - Thank you! Happy to help!

For practicals, you can do the Practice problems/solutions module (5/6 qs I think) in the Itversity course and also the problems in the videos. In addition in udemy there are more practice problems that itversity has launched - I think there are 3 or 4 sets (called CCA 175 Practice Tests) - I did those multiple times. You can get the coupons for those too from this site.



@Sofia Thank you so much!



@Sofia Would you be able to share links? I tried searching couldn’t find. Thanks!



@Sofia Thanks much Sofia



Hi Veena,
Here is the link for the ITversity Practice Tests on Udemy - link



Thanks for a great answer.Sorry it took me a while to acknowledge



Thanks Sofia… Sorry for the late response.



Hi Sofia !

If you remember, we communicated earlier when my exam got cancelled on 2nd May. I was supposed to give the exam again today (4th June). But again the exam did not launch saying exam is not ready.
I am right now feeling so very frustrated and disappointed.
Not sure, why cloudera is not taking these issues seriously.



Hi Sofia,

One more question from me .
Let’s say , when we import data from a table in avro format by using any compression and there is a date field in table, so in imported avro file it comes as BIGINT. We have seen this in Arun’s blog. So while we save this avro file in any other format for example text or parquet , do we need to change this date BIGINT format in date field like (yyyy-mm-dd) in certification exam or we can take this field as it is while saving in other file formats.

Please suggest.



  1. How do you log into spark2-shell in cloudera environment in cca175 exam?
    Is it the same way as in labs or do we need to mention the full path of spark2/bin/spark-shell?

Please provide me any sample command to log into spark2 shell in cloudera

  1. How do i work with avro files in spark2 environment?

which packages to import during spark2 initialization in cca 175 exam

3)Do we need to set some configuration before starting the exam in spark2 shell ?

Have practiced to not in spark2 not in the spark(1.6)…please help



do we need to remove the duplicates if they don’t ask. one of the dgadiraju video… they are removing the duplicates, is that mandatory.

nyse_data - find the stockticker which is not existing in the nyse meta data…



@Sanchit_Kumar did you take the exam yet ? did you try pyspark2 to launch ?



@akalita Nope I havent taken the test yet but you can open it by simply using the command: pyspark2 --master --packages <package_name>