Snappy/Gzip compression on ORC files using Scala


How do I generate compressed ORC files?
sqlContext.setConf(“spark.sql.orc.compression.codec”,“snappy”) does not works. I tried sqlContext.setConf(“”,“snappy”) as well. This didn’t work either.

Please advise.


As per github code of spark repo :
They are providing none, uncompressed, snappy, zlib, lzo . And snappy compression is default codec.
However I am not able to store using with different format



If you find any information please let me know.


It seems ORC and Snappy is not working as expected.


any update on this? even I am not able to set any codec for ORC.
It could be an exercise on CCA exams!!!


Run using this:
spark-shell --master yarn


snappy is the default codec for ORC file, but other codecs are not working


Are you able to found a fix for this issue as other compression are not working for orc?


Workable solution (verified):
With create query, it will work. I always use it with sqlContext:

sqlContext.sql(“CREATE TABLE orders_orc_hive STORED AS orc LOCATION ‘/user/hive/warehouse/tableName’ TBLPROPERTIES(‘orc.compress’=’SNAPPY’) as SELECT * from order_orc”)

Note: You can ignore the location. Also, order_orc is the table which we get via registerTempTable.
ORC has a by default Zlib compression. For snappy, you have to follow above approach. Also, Make sure ‘SNAPPY’ would be in all capital letters.