Saving as Sequence file

#1

dataRDD.map(lambda x: tuple(x.split(",", 1))).saveAsSequenceFile("/user/cloudera/pyspark/departmentsSeq")

In above statement why are we converting the result of split into a tuple.

0 Likes

#2

Sequence file does not have field and line delimiters and store in key and value pairs.
To save data in key and value pairs we need to create tuples for sequence files.

2 Likes