Let us understand how to extract information from dates or times using date_format
function.
yyyyMM Format for Year and Month
To extract the year and month from both date and time columns in yyyyMM
format, you can use the date_format
function. Make sure to convert the data type to an integer after extracting the information.
datetimesDF. \
withColumn("date_ym", date_format("date", "yyyyMM").cast('int')). \
withColumn("time_ym", date_format("time", "yyyyMM").cast('int')). \
show(truncate=False)
yyyyMMddHHmmss Format for Time Information
To get information from time in yyyyMMddHHmmss
format, you can use the date_format
function as well. This will help in extracting detailed information from the time column.
datetimesDF. \
withColumn("date_dt", date_format("date", "yyyyMMddHHmmss")). \
withColumn("date_ts", date_format("time", "yyyyMMddHHmmss")). \
show(truncate=False)
yyyyDDD Format for Year and Day of Year
You can get the year and day of year information using yyyyDDD
format by using the date_format
function and casting the result to an integer data type.
datetimesDF. \
withColumn("date_yd", date_format("date", "yyyyDDD").cast('int')). \
withColumn("time_yd", date_format("time", "yyyyDDD").cast('int')). \
show(truncate=False)
Complete Date Description
To get a complete description of the date, you can use the date_format
function with the appropriate format specifier. This will provide a detailed description of the date in a readable format.
datetimesDF. \
withColumn("date_desc", date_format("date", "MMMM d, yyyy")). \
show(truncate=False)
Weekday Name Extraction
You can extract the name of the weekday using the date_format
function with the format specifier "EE"
for abbreviated weekday name and "EEEE"
for the full weekday name.
datetimesDF. \
withColumn("day_name_abbr", date_format("date", "EE")). \
show(truncate=False)
Hands-On Tasks
- Create a Dataframe
datetimesDF
with columnsdate
andtime
. - Get the year and month from both date and time columns using
yyyyMM
format. Convert the data type to an integer. - Get the information from time in
yyyyMMddHHmmss
format. - Get the year and day of year using
yyyyDDD
format. - Get a complete description of the date.
- Get the name of the weekday using date.
Conclusion
In this article, we discussed how to use the date_format
function in Spark SQL to extract information from dates or times in different formats. By following the step-by-step guide and hands-on tasks provided, you can practice and apply these concepts in your own projects. Remember to engage with the community for further learning and support.