Error while compressing the RDD file

apache-spark

#1

Please let me know, what is the mistake in my code while compressing RDD file

val orders_Compress_RDD = sc.textFile("/user/itversity/retail_db/orders")
orders_Compress_RDD.saveAsTextFile("/user/kbpradeep_dl/ordr_rdd_comprss", classOf[org.apache.hadoop.io.compress.GzipCodec])

After the above command I am getting error.


#2

@kbpradeep_dl You can’t get the file from other HDFS user directory.
If you want mentioned data you can use path like below

val orders_Compress_RDD = sc.textFile("/public/retail_db/orders")


#3

I am able to read the file above command from

val orders = sc.textFile("/user/itversity/retail_db/orders")

but after this command i am getting error. I am trying to store the compressed file in my HDFS.

orders.saveAsTextFile("/user/kbpradeep_dl/ordr_rdd_comprss", classOf[org.apach
e.hadoop.io.compress.GzipCodec])


#4

Since you are reading orders file from other HDFS user, it is throwing some permission related issues i guess. From the screenshot, it is not clear what error you are getting. Can you read the file from the public location as shown below and try.

val orders = sc.textFile("/public/retail_db/orders")

orders.saveAsTextFile("/user/nerellavinod/ordr_rdd_comprss", classOf[org.apache.hadoop.io.compress.GzipCodec])

If you are still facing the issue, please paste the log or screenshot of the error.


#6

I have applied same commands as you replied but facing error.


#7

It says the file already exists. Just change the file name as below.

orders.saveAsTextFile("/user/nerellavinod/ordr_rdd_comprss1", classOf[org.apache.hadoop.io.compress.GzipCodec])


#8

I have tried with different name ordr_rdd_comprss123

Also went to the location stored. it does not appear with this new file name as well as the previous file name

you can clearly see in the screen shot below

And one more thing interesting here it did not throw any error while I used the new file name. But the file is not stored in location.


#9

File copied to your /user/kbpradeep_dl/ directory. Please check the below screenshot for reference.


#10

thanks a lot.
i have typed the command as
hadoop fs -ls /user/kbpradeep_dl/<along with file name so i didn’t get it.


#11