How to save Spark Data Frames to Sequence File



I m trying to find a way to save Spark DataFrames into Sequence file, I could save into Avro, parquet files but not able to save in sequence file


Did you try importing the following package and methods?

import org.apache.spark.rdd.SequenceFileRDDFunctions



@bagalsharad @vinodnerella you can not save Spark DataFrames into Sequence file ,because if you want save as Sequence files it needs key,value pairs so
You need to convert your DataFrame into a RDD[(K,V)].

Example :
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Row, DataFrame}

val rdd : RDD[(Int,String)] = {
case r : Row => (r.getAsInt,r.getAsString) // I’ll let you choose your keys and convert into the right format

Then you can save the RDD :



Example Code :smile:


I have done this way . I have read a avro file in spark and the result is dataframe. I have created a temp table . Created a Database in hive and created a table with seqfile properties “DailyRevenuePerProduct_SEQ” and loaded the temp table values in new table.

val avrofile ="/user/satishp38/spark/dailyrevenueperproduct_SaveasAvro_gzip")
sqlContext.sql(“use satishp38_DailyRevenuePerProduct”)
sqlContext.sql(“CREATE TABLE DailyRevenuePerProduct_SEQ STORED AS SEQUENCEFILE AS SELECT * from DailyRevenuePerProduct”)