Cleared CCA175 on 9th December 2018

cca-175
apache-spark
scala

#1

Hi Everyone ,

I am happy to share that I have successfully cleared Cloudera Certification Exam on 9th December’18.
I have scored 7 out of 9.
I got 2 question from Sqoop and 7 questions from Spark.
I have practiced on ITVersity Labs and followed ITVersity tutorials on YouTube.

As I have practiced on ITVersity, I faced little difficulty in understanding MYSQL JDBC Connection details.
In ITVersity , mySQL JDBC connection is like jdbc:mysql://abc.xyz.com. But in exam hostname is just “gateway”. I was trying to find the full JDBC URL exam environment and lost 10 minutes of time.

I have attempted 8 question out of 9 in the given time of 2 hours.
There was 1 typo in output path I have created , so it did not get score .

Apart from this I have not faced any problem.

Some tips :

  1. Get bigger screen . 15 inches screen is not comfortable to code. you can connect your laptop to LED or Desktop during exam.
  2. If you are taking exam from India, prefer using LAN cable for internet instead of WiFi.
  3. No breaks , No Food and Drinks during the exam.
  4. Install and get hands-on feel on Cloudera Quick Start VM , so that you will be comfortable with exam environment tools like Terminal, Text Editor .
  5. Use Zooming and Copy - Paste Commands for better productivity.
  6. Cross Check the output directory path you are creating for each problem. If there is a small typo also, you wont get score for it.
  7. You can take exam either in Spark 2.3 or Spark 1.6 using spark-shell or spark2-shell .
  8. If input is TextFile, then Cloudera won’t specify the delimiter . you have open the file and check it before start with coding .
  9. If you are setting parquet / avro compression in more than one problem , then use new terminal for each problem so that you wont face any issues if you forgot to change it.
  10. Use ~12 mappers in Sqoop Export to increase parallelism.
  11. Any Spark API can be used to used to solve the problem
  12. No need to save the code anywhere, as cloudera doesn’t evaluate code. It only checks the output requirements like data, folder path, file format and compression technique used.
  13. Have a backup option in case of power outage or internet problems as Cloudera doesn’t provide the re-attempt chance or doesn’t pass the timer.
  14. Protractor will there in online to help you with any issues with environment issues. He won’t have any knowledge on the question content and data .

I would like to thanks ITVersity Team, Moderators and Members of this forum for sharing your knowledge and helping others to grow in their career.

I am happy to answer , if you have any questions .

Regards,
Surendra


#2

Hi Surendra,

Congratulations !!!

I am planning to appear in this exam next week. Some people complained about fonts and slow server issues. Did you face any such issue.
I am also following ITversity tutorials on youtube. Is there any topic which is not covered and came in exam.
Any other advise you like provide?

Thanks,
Vaishali


#3

Thanks @vaishali

Font size in Terminal is very small.
I couldn’t able to type code comfortably in Terminal.
I have done all coding in Sublime Text Editor available after increasing font size.
I just copy pasted commands from Text Editor and executed on Terminal.

If you can use desktop / external screen with 20 inches or more then it will be good.

Youtube playlist has covered all the topics required.


#4

Thanks @Surendranatha_Reddy for the tips.
I have two more questions if you can please answer.
1)Do they provide some code snippet in python or you coded the solution from scratch?
2)Which version of spark have you used.

Thanks
Vaishali


#5

I have used Spark 1.6 .
They do not provide any code snippets neither in scala nor in python.


#6

So is this correct syntax then to import orders data in exam environment. As you mentioned hostname is “gateway” only.

sqoop import --connect jdbc:mysql://gateway:3306/retail_db
–username cloudera --password cloudera
–table orders
–as-avrodatafile
–compress
–compression-codec org.apache.hadoop.io.compress.SnappyCodec
–target-dir /user/cloudera/


#7

Yes Mayur , it is correct syntax.


#8

hi @Surendranatha_Reddy,
according to my understanding when the hostname is given as gateway , we have to get the webapp address from yarn-site.xml , is that not the case ?..


#9

Hi Surendra,

Do they ask query on Sequence file to in the exam ?


#10

Hi Vaishali,

Did you appeared for the exam ?


#11

Hi @akalita, Not yet.


#12

@Surendranatha_Reddy how to increase font size of terminal? And Can I give exam using laptop plugged in(charging) condition?


#13

Hi Hari,
Do you mean that we should use the below webapp address from yarn-site.xml in sqoop connection string instead of node name? . But even in the official exam video from cloudera they have used “gateway” node name directly. Could you please throw some light?

yarn.resourcemanager.webapp.address ****.itversity.com:19288

#14

in yarn-site.xml there is a property as follows,

image

according to my understanding , we should use the value given in this property @mailpradeepcse


#15

HI Hari,

I see in Sample video that he use only gateway without port number .Do we need to use port number for sqoop question . I am going to give exam on 29th Dec.


#16

HI @Surendranatha_Reddy :
Congratulations! and all the best for your future :slight_smile:

I wanted to clarify two of the points which you mentioned.

If you are setting parquet / avro compression in more than one problem , then use new terminal for each problem so that you wont face any issues if forgot to change it.

can you elaborate? I’ve never set Avro/Parquet compression for a session as such. As per my knowledge, we can change compression format by passing parameters in Sqoop or by calling specific method in Spark. Can you please elaborate how to set compression format for a session? or point to any reference.

Use ~12 mappers in Spark Export to increase parallelism.
Here, do you mean 12 mappers in Sqoop?


#17

Hi @Hariharan_Palanicham No you can directly specify ‘gateway’ in JDBC connection string.


#18

Hi @akalita , They are not asking questions on Sequence Files.


#19

Hi @akalita , No need to specify Port Number in Sqoop commands .


#20

Hi @Manoj_Kumar90
Thank you!

  1. When set compression codec using setConf like below sample command , it will be valid for complete spark-shell session.
    sqlContext.setConf("spark.sql.avro.compression.codec","snappy")
    During exam , if you forget to reset the compression technique to default value and solve other problems, you might end up writing files to HDFS with wrong compression technique. That’s why it is suggested to open new terminal similar kind of problems.

  2. Sorry it is typo mistake. I have corrected it.