Hi All, Successfully cleared the exam on my second attempt. Thanks itversity, without your lab, I am not sure if I could have done it.
As said in earlier posts, the exam is not difficult. But you have to do exactly what is asked, nothing more nothing less.
My mistakes in my first attempt.
- If the input file is text, I tried to solve with RDD and one such problem took too much time and in the end I solved only 7/9. But I got the score of 6/9, because in one of the problems, I removed duplicates though it was not asked. So don’t do anything extra not even coalesce(correct me if I am wrong) if it is not asked.
- In my second attempt I solved everything using DataFrames (converted txt to RDD to dataframes) and temp table. I finished all 9 in 1hr40min. This is my approach based on my strength. if you feel strongly about RDD, please use it.
Good luck to every one
Prepare for certifications on our state of the art labs which have Hadoop, Spark, Kafka, Hive and other Big Data technologies
- Click here for signing up for our state of the art 13 node Hadoop and Spark Cluster