Cleared CCA 175 With 100% Score ! May 8th 2019


Hi All - I passed CCA 175 today with 100% success score (All 9/9). Thanks Itversity for the great course in Udemy. It was helpful and so was the lab.

I got 2 Sqoop and 7 Spark questions. I finished in 1.5 hours with 30 mins remaining. Problems were straightforward with clear instructions. Let me know if anyone has any questions and I’ll be happy to answer.

I know everyone says Arun’s blogs, and I don’t want to undermine the blog but I felt Itversity was much better organized and Arun’s questions were very poorly formatted (it was hard to understand many times what was being asked) and also they were of unnecessary complexity which wasted a lot of time during preparation. Looking back, Itversity preparation should suffice.



can u explain the questions briefly…



Just one simple question.

Can we open the pyspark2 shell simply by typing the pyspark2 or do we need to set env variable for that? Also, how did you import avro package?




@Sanchit_Kumar I used spark with scala so used spark2-shell - yes you can open it simply with pyspark2 too. This is what I used : spark2-shell --master yarn --packages com.databricks:spark-avro_2.11:4.0.0 [and I gave a few config params for executors but those are optional and its not a must to use. The above should be enough]

@harsha - 7 Questions involving Spark were mostly around transforming, join operations, some aggregations and producing output in a certain format. I don’t think I can explicitly write the questions here as it is against the exam policy but I can tell you that they were not too hard. I used Spark SQL and Spark Dataframe operations to solve but you can use anything you want - RDD, Core Apis etc. as long as you produce the right output and store in HDFS. Sqoop questions were more around importing in the right format and storing.



@Sofia Thanks for the info. :+1:



Congratulation Sofia!!!

  1. Did you connect mysql database in exam ? what command did you use ?
    is it same as how we connect mysql db in cloudera VM like below

mysql -u "username " -p enter
"enter password "

  1. questions asked to use same retail_db and tables or different list of tables?




@Prasant_Senapati Thank you!
1. Yes I connected to MySql to cross-check and validate my solution. Yes similar command - here is what I used. They had given the gateway ip, username, password in the question
mysql -h “gateway ip” -u “username” -p
Enter Password :
2. Not retail_db but different databases and different list of tables in each question.

1 Like


Thank You @Sofia !!

I have few questions to clarify , due to compliance issue , will not able to post here. my email id . pls drop me an email.

Prasant Senapati



Thanks for the response @Sofia



Sofia ,

did you get question to use other tables apart from retail_db related tables (orders, product, order-item etc)

please mention, if any.



Yes there were different tables such as employees, students, devices etc. I think the data changes all the time

1 Like


Hi Sofia,
Many Congratulations!!
-> I am asking the same question which you asked from others in previous posts before your exam. Did you use coalesce to restrict number of output files ? or left that as default. Please throw some light on this.
-> Also can we open more than one terminal, like one connected with pyspark and second to check files in HDFS or HIVE metastore locations?
-> What configuration parameters you used while connecting to spark ( like num-executors etc ) , Sorry this is repeated question but want one more opinion.



congratulations sofia.
mysql -h “gateway ip” -u “username” -p —> do they give gateway ip?



spark2-shell --master yarn --packages com.databricks:spark-avro_2.11:4.0.0

i dint use master yarn for my practice. I am using below line.
spark-shell --packages com.databricks:spark-avro_2.10:2.0.1

which one should we use? could you please let me know?



@girish381 - Thanks!

  1. Great question - yes I never got the answer for that previous to my exam so I coalesced a few of my solutions and for others I left them as default part-* files. I was trying to hedge my bets! Considering that all my solutions came out as correct in the exam, my conclusion is that they really don’t care if you produce 1 big file after coalesce or many small part files (unless the question specifically asks you to produce X number of files). I think their software solution reads the solution from the directory to check and verify. So it doesn’t matter how many files are in there.

  2. You can open as many terminals as you want - I had opened about 5 tabs within one terminal connected to different things

  3. I used these configs after looking at the cluster setup on the RM page of the VM that was provided to me-
    –num-executors 4 --executor-memory 2g --executor-cores 2

1 Like


@VeenaReddy - Thanks Veena. Yes they will give the gateway address

You are using Spark 1.6 without specifying a cluster manager to connect to and with no parallelism. As a result it won’t use all the nodes for the task execution - hence it will just pick defaults but for exam purposes it doesn’t really change any of the things you would typically do in the shell, so you can continue if you want. However considering how Spark actually works internally, my recommendation would be to use yarn in real applications for performance and wherever possible.



Great!! Thanks for your time and inputs Sofia.



Hi Sofia,

Congratulations on passing the exam.
I have few questions, well mostly about the exam environment. Hoping to hear from you soon. Thanks in advance!

  1. Do we have to attempt all the questions in terminal itself or invoke any other application like jupyter notebook etc. What about hive, mysql questions? Can we access it like usual with ‘hive’ and ‘mysql’ at the prompt?
  2. How is the help documentation available? Any link or via browser ?
  3. Do we need to use any other application other than terminal for any reason?




Hi Sri - Thank you! Regarding your questions :

  1. You can open other applications to write your code. I wrote my solution code snippets first in Sublime Text - it was provided in the VM (I think I saw a few other things like gedit and eclipse also in the VM but I used Sublime Text). Yes you can go to the terminal prompt and say ‘hive’ to access hive and same for mysql from the prompt with the mysql command I have mentioned in my posts above.

  2. Yes as far as I remember the browser in the VM had Favorites/Bookmark Bar with help documentation. I didn’t access the documentation during the exam though.

  3. Sublime Text and Terminal is all I needed to use. Nope nothing more is needed.



Cool! Thank you for everything !