Using Compression

pyspark

#1

In one of the practice i was doing .I saved my data into HDFS in JSON

orders_join.coalesce(1).write.json(“user/tarunsteja/problem1/dailyrevenue_JSON”)

Now how do i need to save same above step with compression as well.

Can anyone modify above step and reply me here.


#2

Hi @Tarun_Teja

Here is the python code for compressing data in parquet file format:

sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")

sqlContext.read.json("/public/retail_db_json/orders"). \
write.parquet("/user/username/orders_parquet")

Regards,
Sunil Abhishek