Snappy compression with parquet file

pyspark
apache-spark
bigdatalabs

#1

I am trying to save the result in parquet wtih snappy compression below,
productsSql.coalesce(1).write.format(“parquet”).save("/user/panatimahesh/sol-parquet-snappy",compressionCodecClass = “snappy”)

but when I check the directory its with .gz. not sure why. Any help?

environment : itversity labs , python


#2

Try to use below method.

sqlContext.setConf(“spark.sql.parquet.compression.codec”, “snappy”)
someDataFrame.write.parquet("/user/panatimahesh/sol-parquet-snappy")


#3

Thank you Vinayak! Thats helpful.


#4

Hi.

That is in scala… do you have anything in PYSPARK?


#5

Hi @panamare, how do you find the link helpful?
It is in scala… while you were using python.