Snappy compression for avro file

pyspark

#1

Hi,

I am trying to save a DF to a AVRO file with snappy compression.However didn’t notice any size difference between avro file without snappy compression and with snappy compression.I am using the below codes…

top5CustPerMonthMapSortedMapDF.coalesce(2).write.format(“com.databricks.spark.avro”).save("/user/sushital1997/DRPROB6/avro/top_5_cust")

AVRO compression
sqlContext.setConf(“spark.sql.avro.compression.codec”,“snappy”)
top5CustPerMonthMapSortedMapDF.save("/user/sushital1997/DRPROB6/avro/top_5_cust1_snappy",“com.databricks.spark.avro”)

I also tried with the below code
top5CustPerMonthMapSortedMapDF.coalesce(2).write.format(“com.databricks.spark.avro”).save("/user/sushital1997/DRPROB6/avro/top_5_cust_snappy").

I heard that snappy doesn’t work with DataFrame.So what I should do during certification exam if they ask to save a DF to avro with snappy compression.


#2

@sushital1997 I hope this discussion will help you.