Parquet to CSV using Pyspark 1.6.2



I read parquet file parquet_df =’/home/sanjay/submissions-parquet’)
Now, I want to convert selective columns (Parquet2CSV) from parquet_df to csv file with Pyspark 1.6.2
Any help will be greatly appreciated.


Hi Sanjay,
I came up with this below solution. I think there could be another optimal way to deal with the selective columns.

  1. pyspark --master yarn --conf spark.ui.port=12345 --num-executors 4 --packages
  2. parquet_df =’/home/sanjay/submissions-parquet’)
  3. parquet_df .registerTempTable(“parquet_table”)
  4. col_parquet_df = sqlContext.sql(“select col_1, col_2 from parquet_table”)
  5.“output dir”, “com.databricks.spark.csv”)

Hope it solves.



Thanks Aparna, I am not allowed to download external libraries on my cluster but I got a clue.


Hi Sanjay,
Now I am getting another question, Is there any other way to have only selective columns in the dataframe (instead of converting to temporary table and then selecting columns)?