Saving Text File in snappy compression


Is there any way to save data in textFile using snappy compression. It is failing when I am trying to save a RDD.

Try this…


This is wrong,
Correct syntax is as follow (but that is also not working):


It is working for me.

What is the error u r getting? Can you provide the details


Are you using Scala or Python?


I am using Scala @dgadiraju


This is what I am getting when I try your piece of code.
:28: error: overloaded method value saveAsTextFile with alternatives:
(path: String,codec: Class[_ <:])Unit
(path: String)Unit
cannot be applied to (String, compressionCodecClass: String)>x.mkString(",")).saveAsTextFile("/user/cloudera/text-snappy",compressionCodecClass=“”)


@dgadiraju @connectsachit
With Scala compressionCodecClass is giving an error which Sachit has mentioned. So i used, classOf[] and it is working.


Below is the o/p
With Snappy

-rw-r–r-- 3 sarangdp1 hdfs 0 2018-04-30 07:21 snappyOp3/_SUCCESS
-rw-r–r-- 3 sarangdp1 hdfs 145212 2018-04-30 07:21 snappyOp3/part-00000.snappy
-rw-r–r-- 3 sarangdp1 hdfs 150121 2018-04-30 07:21 snappyOp3/part-00001.snappy

Without Snappy…Normal Save
-rw-r–r-- 3 sarangdp1 hdfs 0 2018-04-30 07:23 normal/_SUCCESS
-rw-r–r-- 3 sarangdp1 hdfs 476627 2018-04-30 07:23 normal/part-00000
-rw-r–r-- 3 sarangdp1 hdfs 483957 2018-04-30 07:23 normal/part-00001


I have also tried the same code. But it is showing error for me:


Sachit…looks like you are not running the code on itverisity labs…it doesnt have the snappy jar set the in the class path…

If you are in hdp cluster, check the $HADOOP_HOME/lib…it should have the snappy jar and the $HADOOP_HOME/lib/native “”

  1. LD_LIBRARY_PATH and JAVA_LIBRARY_PATH contains the native directory path having the** files.
  2. LD_LIBRARY_PATH and JAVA_LIBRARY path have been exported in the SPARK environment(

Error while saveAsTextFile with Snappy Codec

It seems you are running on Cloudera QuickStart VM.

Please go to /etc/hadoop/conf/core-site.xml and search for compression codecs to see if it includes Snappy.


It does not includes any compression codec. But when I did sqoop import using snappy compression, it worked. Also When I saved using Spark SQL in snappy it worked. Any reason for that?


Hello Durga Sir

I am also facing same issue and I amusing Pyspark in cloudera. I checked /etc/hadoop/conf/core-site.xml and did not find any ‘compression’ or ‘compress’ keyword.

Please suggest a solution. I am wondering this happens in exam then what could be the workaround.