Need help with date format -- urgent!

pyspark

#2

import functions and unixtime and divide the order_date by 1000(as time is in milliseconds). Below is the code try it and let us know.

from pyspark.sql import functions as F
from pyspark.sql.functions import from_unixtime

 oRaw.select('order_date', from_unixtime(F.col('order_date')/1000).alias('date')).show()

#1

Need urgent help. I have setup data below as follows:

sqoop import
–connect jdbc:mysql://localhost/retail_db
–username root
–password cloudera
–table orders
–target-dir /user/cloudera/retail_import/orders
–as-avrodatafile
–compress
–compression-code org.apache.hadoop.io.compress.SnappyCodec

Now, I am trying to create a DF in pyspark

oRaw=sqlContext.read.format(“com.databricks.spark.avro”).load("/user/cloudera/retail_import/orders")
oRaw
DataFrame[order_id: int, order_date: bigint, order_customer_id: int, order_status: string]

oRaw.first()
Row(order_id=1, order_date=1374735600000, order_customer_id=11599, order_status=u’CLOSED’)

The date is in the form of timestamp. So, I am trying to convert to YYYYMM

I tried using date_format by importing functions, however I am unable to use it.

from pyspark.sql import functions as f

Let me know how to convert timestamp to date format while working with - DataFrame and RDD.

Thanks


Pyspark - Issue with converting bigint to date
#3

Thanks for the quick response.


#4