Do they ask specifically to use Spark RDD instead of dataframe. Was there any questions on groupBykey, countByKey() etc.
Thanks for the post. For Sqoop import do we need to always have one file in HDFS as output if nothing is mentioned in the question? Do we get Impala questions? I saw is some thread about Impala being asked…