Hi, when output as dataframe needs to be saved do we use any repartition(1) or just save save the dataframe. Because there 200 file will be generated if there is any ‘by key’ operation and save the dataframe as it is.
please advise what they check, content of output or nor of files as well…
Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs
- Click here for access to state of the art 13 node Hadoop and Spark Cluster