Need help with date format -- urgent!



Need urgent help. I have setup data below as follows:

sqoop import
–connect jdbc:mysql://localhost/retail_db
–username root
–password cloudera
–table orders
–target-dir /user/cloudera/retail_import/orders

Now, I am trying to create a DF in pyspark“com.databricks.spark.avro”).load("/user/cloudera/retail_import/orders")
DataFrame[order_id: int, order_date: bigint, order_customer_id: int, order_status: string]

Row(order_id=1, order_date=1374735600000, order_customer_id=11599, order_status=u’CLOSED’)

The date is in the form of timestamp. So, I am trying to convert to YYYYMM

I tried using date_format by importing functions, however I am unable to use it.

from pyspark.sql import functions as f

Let me know how to convert timestamp to date format while working with - DataFrame and RDD.


Pyspark - Issue with converting bigint to date

import functions and unixtime and divide the order_date by 1000(as time is in milliseconds). Below is the code try it and let us know.

from pyspark.sql import functions as F
from pyspark.sql.functions import from_unixtime'order_date', from_unixtime(F.col('order_date')/1000).alias('date')).show()


Thanks for the quick response.