I have a query regarding writing output as TextFile in CCA175.
I am converting a Dataframe to RDD and then using RDD’s saveAsTextFile method for writing output result to HDFS. and output is getting written into 200 files as per spark.sql.shuflle option default value.
My question is if it is not mentioned to store the output in N number of files in a question. should we use a default option which will store the output in 200 files or we should change the number of output files as per our choice?
Prepare for certifications on our state of the art labs which have Hadoop, Spark, Kafka, Hive and other Big Data technologies
- Click here for signing up for our state of the art 13 node Hadoop and Spark Cluster