How to save Spark Data Frames to Sequence File

dataframes

#1

I m trying to find a way to save Spark DataFrames into Sequence file, I could save into Avro, parquet files but not able to save in sequence file


#2

Did you try importing the following package and methods?

import org.apache.spark.rdd.SequenceFileRDDFunctions

saveAsSequenceFile


#3

@bagalsharad @vinodnerella you can not save Spark DataFrames into Sequence file ,because if you want save as Sequence files it needs key,value pairs so
You need to convert your DataFrame into a RDD[(K,V)].

Example :
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Row, DataFrame}

val rdd : RDD[(Int,String)] = df.rdd.map {
case r : Row => (r.getAsInt,r.getAsString) // I’ll let you choose your keys and convert into the right format
}

Then you can save the RDD :

rdd.saveAsSequenceFile(“output.seq”)


#4

Example Code :smile:

https://github.com/Re1tReddy/Spark/blob/master/Spark-2.1/src/main/scala/com/spark2/examples/Spark_To_SequenceFiles.scala


#5

I have done this way . I have read a avro file in spark and the result is dataframe. I have created a temp table . Created a Database in hive and created a table with seqfile properties “DailyRevenuePerProduct_SEQ” and loaded the temp table values in new table.

val avrofile = sqlContext.read.avro("/user/satishp38/spark/dailyrevenueperproduct_SaveasAvro_gzip")
avrofile.registerTempTable(“DailyRevenuePerProduct”)
sqlContext.sql(“use satishp38_DailyRevenuePerProduct”)
sqlContext.sql(“CREATE TABLE DailyRevenuePerProduct_SEQ STORED AS SEQUENCEFILE AS SELECT * from DailyRevenuePerProduct”)