Spark File Format


#1

for some of the file format, i see that to use below code for writing or reading…is it mandatory to use it or just i can use DF.write.parquet or json or avro?

Pleae guide me

sqlContext.setConf(“spark.sql.parquet.compression.codec”,”gzip”) //use gzip,
sqlContext.setConf(“spark.sql.avro.compression.codec”,”snappy”) //use snappy,

source: http://www.itversity.com/lessons/file-formats-in-spark/


#2

Hi,
The above commands are not for the file formats, rather to store the data in a particular compressed format.

Say, we want to store the result in json format, in snappy compression mode.
we can write,
sparkContext.setConf(“spark.sql.json.compression.codec”,“snappy”)
DF.write.json(“output dir”)

Hope it helps!
Thanks
Aparna


#3

Thanks for your reply, yes I understood.Also, could you say how do we know which two classes we should use when we try to read sequencefile

sparkContext.sequenceFile(,classOf[],classOf[]);

  1. classOf[]
  2. classOf[]

#4

I am sorry Fayaz as I am learning spark in python.

But you can follow the link below, to have a clear idea.

Thanks
Aparna