Snappy/Gzip compression on ORC files using Scala

Snappy/Gzip compression on ORC files using Scala
0.0 0

#1

How do I generate compressed ORC files?
sqlContext.setConf(“spark.sql.orc.compression.codec”,“snappy”) does not works. I tried sqlContext.setConf(“spark.io.compression.codec”,“snappy”) as well. This didn’t work either.

Please advise.


#2

As per github code of spark repo : https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala
They are providing none, uncompressed, snappy, zlib, lzo . And snappy compression is default codec.
However I am not able to store using with different format
sqlContext.setConf(“spark.sql.hive.orc.compression.codec”,“zlib”)

sqlContext.setConf(“spark.sql.hive.orc.compress.codec”,“zlib”)

sqlContext.setConf(“spark.io.compression.codec”,“zlib”)

If you find any information please let me know.


#5

It seems ORC and Snappy is not working as expected.


#6

any update on this? even I am not able to set any codec for ORC.
It could be an exercise on CCA exams!!!


#7

Run using this:
spark-shell --master yarn
–conf spark.io.compression.codec=snappy


#8

snappy is the default codec for ORC file, but other codecs are not working


#9

Are you able to found a fix for this issue as other compression are not working for orc?