How to read compressed file without extension

apache-spark
#1

How to read a compressed file say .gz ?
But it is sorted without .gz Extension rather stored as .dat file. I can’t able to read it as rdd directly by sc.textFile. i tried reading it but gives me gibberish data.

Any one have idea on this use case…?

0 Likes

#2

if your file extension is .dat then sc.textFile will work. I have already used it.
example:
rdd1=sc.textFile("/user/karthik/movies/movie.dat")

0 Likes