Cleared the CCA 175 certification June 17th

Durga Sir thanks a ton for the awesome youtube videos. Not only did your content inspired me but it helped clear the certification as well. Thanks!!!

For folks aspiring to take the certification. My 2cents.
Time management is key.
Please follow Durga Sir’s content the 25 videos. Practice them.
Try solving the different file formats and compression codes. Pay extra emphasis on the requirement.
Solve the problem scenarios in the workshop.
Review Arun Sir’s content.


Any insight into question split will be helpful for the aspirants.

Thats awesome…I have almost covered all videos and practicing since 3-4 months. But lacking somewhere on my confidence. i work mostly with SQL in my regular work for analysis purpose. Could you help me wid ur suggestions on below?

1.Is there a need to practice more on SQL from online tests?
2. Are the questions tough enough which will require banging on heads for the solution?
3. Or just basic like Load in RDD, Filter and store back?
4. Was there any questions on handling date format in the data?
5. Anything related to configurations?

Your reply will be appreciated :slight_smile:

Thanks in advance.

1 Like

@dev2003 Congratulations!! Are there any questions from FLUME??

As you are aware you can implement the solutions using core RDD’s or dataframe or DataFrame using SQL. Considering every second is precious, I have been working with sql for sometime. Its my personal view that implementing solution using SQL is the fastest.

IF you go thru Durga Sir’s content you will be able to solve the problems.
Inspite having good internet speed I found the cluster very sluggish. Scrolling in the editor took considerable time. I lost precious time doing the edits.

Didn’t get any Flume queries. 2 sqoop and 7 on spark. One is free to use language of choice, spark-shell or pyspark. One advantage using Spark-shell is there are times when using the tab feature helps you get the different functions available. It can be helpful at times.

@dev2003 One final question, as far as I noticed the conversations with the previous exam takers it is enough to have the knowledge of either scala or python not both. Please confirm me, because I am preparing only with scala…

Congrats dev…any questions from impala and hive ?

Scala language is enough. You use instance of HiveContext for executing sql. No question from impala

Hi Dev, Congrats, did you get any question on Hive related?

metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException facicng this issue when i am accessing hive metadata from clouderaquick start vm if you have faced similar kind of issue please help on this

import org.apache.spark.sql.hive.HiveContext

val sqx = new HiveContext(sc)

sqx.sql(“show tables”)

Once you have an object of hivecontext u should be able to execute your hive queries


Even this would work right…

default sqlContext that’s available in spark-shell.

sqlContext.sql(“show tables”).show().

Only when you need the support of Window functions to execute Hive UDF’s we need to create hiveContext from HiveContext(sc). Otherwise most of the general usage of SQL operations default sqlContext will be sufficient.

Correct me if I am wrong.


@naveeneu If you will not specify the database name than Spark-Sql by default consider database name as “default” database.And here default database if not available in you hive. Thats why it is giving no object exception. So before you make any query to different database either execute “use db_name” before firing actual query. OR you can specify the database_name.table_name in spark-sql query.

is it possible to answer all spark questions using sparksql. means if we learn sparksql then we are able to answer all spark questions?