Pyspark - saveAsTextFile

@itversity

Hi I was trying to save the file back to HDFS after reading it in spark.
When i try to view the textfile after saving it, i could see more files are created as split.
Can anyone help me to understand how it will be saved to many files.

In youtube durga sir vidoes, there was only 2 files which is SUCCESS and part_* are created.
For me im getting as below,

@Janaki_K - Could you please past the queries which you used ?

@gnanaprakasam
Query:
dataRDD = sc.textFile("/user/kjanakijanu/sqoop_import/departments")
for line in dataRDD.collect():
print(line)

print(dataRDD.count())

dataRDD.saveAsTextFile("/user/kjanakijanu/pyspark/departments")

You might have used -m n or --num-mappers n in your sqoop which might have created n part files

1 Like

Yes. now i understood. Thank you!