How to read snappy compressed a parquet file?

scala

#1

Hi All,

How to read snappy compressed a parquet file ?

Thanks in advance


#2

Demo is done on our state of the art Big Data cluster with Hadoop, Spark etc - https://labs.itversity.com


Here is the sample code to generate data in parquet format with compression codec snappy:

val orders = sqlContext.read.json("/public/retail_db_json/orders")
sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
orders.write.parquet("/user/itversity/orders_snappy")

Valid options for spark.sql.parquet.compression.codec are uncompressed, gzip, snappy etc. gzip is default

Validation:

[itversity@gw02 ~]$ hadoop fs -ls orders_snappy
Found 5 items
-rw-r–r-- 3 itversity hdfs 0 2018-04-10 11:33 orders_snappy/_SUCCESS
-rw-r–r-- 3 itversity hdfs 495 2018-04-10 11:33 orders_snappy/_common_metadata
-rw-r–r-- 3 itversity hdfs 1668 2018-04-10 11:33 orders_snappy/_metadata
-rw-r–r-- 3 itversity hdfs 266423 2018-04-10 11:33 orders_snappy/part-r-00000-3dc4646d-67ec-4d3d-8369-2551b6199b39.snappy.parquet
-rw-r–r-- 3 itversity hdfs 268441 2018-04-10 11:33 orders_snappy/part-r-00001-3dc4646d-67ec-4d3d-8369-2551b6199b39.snappy.parquet

How to read data from snappy compressed parquet file?

sqlContext.read.parquet("/user/itversity/orders_snappy").show


#3

Thank you for your reply @dgadiraju sir!
My bad, I was not using the correct path in my case :slight_smile:


#4

You should be more elaborate while raising the issues :slight_smile: