Applying compression on ORC


I have tried 3 compression(gzip,zlib,snappy) while saving data in ORC format but it is getting saved as normal orc. File getting created is of same size for all.

Code used:
sc.parallelize(1 to 10).toDF().write.mode(“overwrite”).format(“orc”).save("/user/cloudera/zlib")

Any other way to apply compression on ORC.
PS: I think ORC is already compressed, but not sure.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


By default ORC is compressed. We cannot use sqlContext.setConf, it only work with Parquet and Avro.

I need to troubleshoot further about compressing with different algorithms for ORC. For others you can go through this topic


From exam’s perspective. I have covered reading /writing in avro,json,parquet,sequence with compression. Is that sufficient ?
Only Orc is pending along with compression. Though I have covered orc reading / writing without compression.