Saving Text File in snappy compression


#1

Is there any way to save data in textFile using snappy compression. It is failing when I am trying to save a RDD.


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

Try this…
rddName.saveAsTextFile(“path”,compressionCodecClass=“org.apache.hadoop.io.compress.SnappyCodec”)


#3

This is wrong,
Correct syntax is as follow (but that is also not working):
rdd.saveAsTextFile(“path”,classOf[org.apache.hadoop.io.compress.SnappyCodec])


#4

It is working for me.

What is the error u r getting? Can you provide the details


#5

Are you using Scala or Python?


#6

I am using Scala @dgadiraju


#7

@sarang
This is what I am getting when I try your piece of code.
scala> df.rdd.map(x=>x.mkString(",")).saveAsTextFile("/user/cloudera/text-snappy",compressionCodecClass=“org.apache.hadoop.io.compress.SnappyCodec”)
:28: error: overloaded method value saveAsTextFile with alternatives:
(path: String,codec: Class[_ <: org.apache.hadoop.io.compress.CompressionCodec])Unit
(path: String)Unit
cannot be applied to (String, compressionCodecClass: String)
df.rdd.map(x=>x.mkString(",")).saveAsTextFile("/user/cloudera/text-snappy",compressionCodecClass=“org.apache.hadoop.io.compress.SnappyCodec”)


#8

@dgadiraju @connectsachit
With Scala compressionCodecClass is giving an error which Sachit has mentioned. So i used, classOf[org.apache.hadoop.io.compress.SnappyCodec] and it is working.

ordersComplete.saveAsTextFile(path,classOf[org.apache.hadoop.io.compress.SnappyCodec])

Below is the o/p
With Snappy

-rw-r–r-- 3 sarangdp1 hdfs 0 2018-04-30 07:21 snappyOp3/_SUCCESS
-rw-r–r-- 3 sarangdp1 hdfs 145212 2018-04-30 07:21 snappyOp3/part-00000.snappy
-rw-r–r-- 3 sarangdp1 hdfs 150121 2018-04-30 07:21 snappyOp3/part-00001.snappy

Without Snappy…Normal Save
-rw-r–r-- 3 sarangdp1 hdfs 0 2018-04-30 07:23 normal/_SUCCESS
-rw-r–r-- 3 sarangdp1 hdfs 476627 2018-04-30 07:23 normal/part-00000
-rw-r–r-- 3 sarangdp1 hdfs 483957 2018-04-30 07:23 normal/part-00001


#9

@sarang, thank you for responding to the question. Let us build a decent itversity community :slight_smile:


#10

@sarang
I have also tried the same code. But it is showing error for me:


#11

Sachit…looks like you are not running the code on itverisity labs…it doesnt have the snappy jar set the in the class path…

If you are in hdp cluster, check the $HADOOP_HOME/lib…it should have the snappy jar and the $HADOOP_HOME/lib/native “libsnappy.so”

  1. LD_LIBRARY_PATH and JAVA_LIBRARY_PATH contains the native directory path having the libsnappy.so** files.
  2. LD_LIBRARY_PATH and JAVA_LIBRARY path have been exported in the SPARK environment(spark-env.sh).

#12

It seems you are running on Cloudera QuickStart VM.

Please go to /etc/hadoop/conf/core-site.xml and search for compression codecs to see if it includes Snappy.


#13

It does not includes any compression codec. But when I did sqoop import using snappy compression, it worked. Also When I saved using Spark SQL in snappy it worked. Any reason for that?
Thanks,
Sachit