Cleared CCA 175 on 3/31/2019

#1

Hello,

I have cleared the exam with 8/9 score. Itversity lab and Udemy course from Durga sir has been really helpful. Thank you.


Prepare for certifications on our state of the art labs which have Hadoop, Spark, Kafka, Hive and other Big Data technologies

  • Click here for signing up for our state of the art 13 node Hadoop and Spark Cluster

1 Like

#2

can you give some more details like

  1. No of questions on Sqoop, Hive, Spark
  2. Any spark template provided?
  3. Type of questions on spark
  4. Specific file format to use
  5. Do we have to spark submit or pyspark shell will work
  6. any issues faced?
0 Likes

#3

Hi Ambuj,
Congratulations!
I have below questions.
1.) While importing or exporting from sqoop to hive, do you have to handle NULL? Do they specify anything about NULL handling in the question?
2.) If they don’t mention anything about rounding price to 2 decimal, have you did it or not?Are we suppose to do it or leave it as it is?

0 Likes

#4

Hello,
Find answer below to your questions.

  1. No of questions on Sqoop, Hive, Spark:
    –2 Sqoop, Hive (0), Spark: 7
  2. Any spark template provided?:
    –No Template. You need to use spark-shell to solve the problem.
  3. Type of questions on spark.
    –Reading files comma,tab delimited. Transformation/Aggregation (sum, count,avg etc) and Saving those files in Parquet+snappy compression in HDFS location.
  4. Specific file format to use
    – Parquet in my case, but explore Avro, ORC with compressions.
  5. Do we have to spark submit or pyspark shell will work
    – spark-submit is not needed, use spark-shell or pyspark shell.
  6. any issues faced?
    –No issue, pretty straight forward, if you are good with Spark RDD, SQL and DF API and Sqoop.

Thank you.

1 Like

#5

Hello,

Thank you.
1.) While importing or exporting from sqoop to hive, do you have to handle NULL? Do they specify anything about NULL handling in the question?

– They don’t specify obvious things like , delimiter used the hdfs files, you can always find it.
– About NULL handling they should give you instruction on what to do else by default people will not use it as I didn’t use it.
2.) If they don’t mention anything about rounding price to 2 decimal, have you did it or not?Are we suppose to do it or leave it as it is?

– Cloudera gives sample output so that one can produce the same result set. They had given me sample output to produce the data where I didn’t have to round to 2 decimal places. (advice) Don’t do it if they don’t ask.

Let me know if there are any doubts.

Thank you.

0 Likes

#6

Ambuj,
Thank you so much for your reply and advice. appreciate it.
Have you used RDD or dataframe or SparkSQL? Do I need to use all 3 or whatever I am comfortable with?
How large their dataset was? How long it takes to run spark/sqoop command against their dataset?

0 Likes

#7

Hi Ambuj,

During the exam, do we will be having the permission to copy hive-site.xml to spark conf directory or can we create sudo link.

Thanks

0 Likes

#8

Hello,

I am not sure about it as I didn’t try to see the conf file in the env.

Thank you.

0 Likes

#9

Hello,

  1. I used dataframe API and SparkSQL API. Use whatever you are comfortable with.
  2. Millions or records.
  3. I used 4 executors, 2GB mem and 2 cores in my configuration to spark-shell. but please see the yarn resource page to understand how much you can allocate. Please refer to yarn-site.xml to find the resource manager page url. For me it took 2-3 minutes to process it.

Thank you.

0 Likes

#10

Hi Ambuj,
Thanks for reply.
where can I locate yarn-site.xml or yarn resource page?
if you don’t mind, can you share the command you used to open the spark-shell with above mentioned configuration?

0 Likes

#11

/etc/hadoop/conf/yarn-site.xml

spark-shell --master yarn --num-executors 4 --executor-memory 2G --executor-cores 2

Hope this will help.

Thank you.

0 Likes

#12

Ambuj,
Thank you so much. I really appreciate it.

0 Likes

#13

search for string “yarn.resourcemanager.webapp.https.address” in the yarn-site.xml

Thank you.

0 Likes

#14

ok, will do. Thanks once again.

0 Likes

#16

no problem. All the best for your exam.

0 Likes

#17

Congratulations Ambuj !!

I have very silly question pertaining to the test;

  1. Will they ask us to solve the spark questions using only DF, or spark-sql or only reduce-by-key… (like you might would have seen in arun’s blog)
    or are we free to choose our own way to approach a question ?

Thank You

0 Likes

#18

Hello,

You are free to use any of these RDD API, DF API or SQL API. Only output data matters to them.

Thank you.

1 Like

#19

Thankyou so much @Ambuj ! wish you best of luck for your future, thanks for being helpful.

0 Likes

#20

Hi,

How can we access Hive metastore from spark shell during the exam

Thanks

0 Likes

#21

Congrats Ambuj
Could you please throw some light on the following?

  1. Is spark or spark2 used?
  2. Are the code snippets in scala or python?
0 Likes