Apache Spark Python - Processing Column Data - Using date_format Function

Let us understand how to extract information from dates or times using date_format function.

yyyyMM Format for Year and Month

To extract the year and month from both date and time columns in yyyyMM format, you can use the date_format function. Make sure to convert the data type to an integer after extracting the information.

datetimesDF. \
    withColumn("date_ym", date_format("date", "yyyyMM").cast('int')). \
    withColumn("time_ym", date_format("time", "yyyyMM").cast('int')). \
    show(truncate=False)

yyyyMMddHHmmss Format for Time Information

To get information from time in yyyyMMddHHmmss format, you can use the date_format function as well. This will help in extracting detailed information from the time column.

datetimesDF. \
    withColumn("date_dt", date_format("date", "yyyyMMddHHmmss")). \
    withColumn("date_ts", date_format("time", "yyyyMMddHHmmss")). \
    show(truncate=False)

yyyyDDD Format for Year and Day of Year

You can get the year and day of year information using yyyyDDD format by using the date_format function and casting the result to an integer data type.

datetimesDF. \
    withColumn("date_yd", date_format("date", "yyyyDDD").cast('int')). \
    withColumn("time_yd", date_format("time", "yyyyDDD").cast('int')). \
    show(truncate=False)

Complete Date Description

To get a complete description of the date, you can use the date_format function with the appropriate format specifier. This will provide a detailed description of the date in a readable format.

datetimesDF. \
    withColumn("date_desc", date_format("date", "MMMM d, yyyy")). \
    show(truncate=False)

Weekday Name Extraction

You can extract the name of the weekday using the date_format function with the format specifier "EE" for abbreviated weekday name and "EEEE" for the full weekday name.

datetimesDF. \
    withColumn("day_name_abbr", date_format("date", "EE")). \
    show(truncate=False)

Watch the video tutorial here

Hands-On Tasks

  1. Create a Dataframe datetimesDF with columns date and time.
  2. Get the year and month from both date and time columns using yyyyMM format. Convert the data type to an integer.
  3. Get the information from time in yyyyMMddHHmmss format.
  4. Get the year and day of year using yyyyDDD format.
  5. Get a complete description of the date.
  6. Get the name of the weekday using date.

Conclusion

In this article, we discussed how to use the date_format function in Spark SQL to extract information from dates or times in different formats. By following the step-by-step guide and hands-on tasks provided, you can practice and apply these concepts in your own projects. Remember to engage with the community for further learning and support.