I have tried 3 compression(gzip,zlib,snappy) while saving data in ORC format but it is getting saved as normal orc. File getting created is of same size for all.
sc.parallelize(1 to 10).toDF().write.mode(“overwrite”).format(“orc”).save("/user/cloudera/zlib")
Any other way to apply compression on ORC.
PS: I think ORC is already compressed, but not sure.
Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs
- Click here for access to state of the art 13 node Hadoop and Spark Cluster