Saving dataframe result in exam


Hi, when output as dataframe needs to be saved do we use any repartition(1) or just save save the dataframe. Because there 200 file will be generated if there is any ‘by key’ operation and save the dataframe as it is.

please advise what they check, content of output or nor of files as well…

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster



You can use any of the options. if it is specifically mentioned in the question that the output should be in 1 file or 2 files you have to use reparation/coalesce otherwise you can write the output in any number of files.