Questions related to CCA 175

Hi All,

I have few basic questions related to the CCA 175 exam.Who ever has cleared it be really helpfull to answer it as per there experience.

1)) When we get the O/P using Dataframe /SQLContext/HiveContext, the result comes in a Row format.
EX: Row(Col1,col2,col2…) . Is this acceptable or we need to store results in a flat format as we get in normal spark APIs using python/scala. If yes, is there a way to convert the O/P to such plane format.

  1. As we see in the Videos , most of the O/P are just stored in Dataframes or RDDs. Its not actually stored back to a file in specific path. I believe in the exam we need to store all the results in Dataframe or RDDs to a specific location. Am i correct, if yes then do we always use rdd.saveAsTextFile(“path”) for storing all the results.

  2. Do we get questions which expects answers as a value , such as Average . Then how we are expected to write the result, is it again needed to be stored in a file path.

  3. As per input from many participants, sparksql/Hivecontext should be used more often as its easier. But, are all the tables in the exam cluster available in Hive to run our queries on Sparksql/Hivecontext. And is it mentioned in the question from where to take the input,whether it is HDFS files or Hive tables to be used. Because native sqlcontext can be sometimes very time consuming.

Thanks
Plaban