When i directly gave “dataRDD.map(lambda x: (tuple(x.split(”,", 1))).saveAsSeequenceFile("/user/cloudera/pysprk/departmentsSeq"), it gave me an **error : “dataRDD not defined”.
Hence i first defined ‘dataRDD’ as dataRDD = sc.textFile("/user/cloudera/departments") and then gave the map function.
Output was successful
My doubt is - should we first define the variable as ‘Text file’ before applying ‘Map()’ function to it?
If yes, should we always save it in ‘Text’ format in hdfs?
Could you please clarify the doubt?