Apache Spark Python - Processing Column Data - Padding Characters around Strings

In this article, we will explore how to pad characters to strings using Spark Functions. Padding characters are often used to create fixed-length values or records, commonly seen in Mainframes based systems.

Using lpad and rpad Functions

We use lpad function to pad a string with a specific character on the leading or left side, and rpad to pad on the trailing or right side. Both functions take 3 arguments: column or expression, desired length, and the character to be padded.

# Example of using lpad function
from pyspark.sql.functions import lpad

df.select(lpad(lit("Hello"), 10, "-").alias("dummy")).show()

Padding Fixed-Length Fields

When creating fixed-length fields, it’s important to pad each field with the appropriate characters to match the predetermined length. For numeric fields, we pad with zero on the leading side, and for non-numeric fields, we pad with a standard character.

# Example of padding multiple fields in a DataFrame
empFixedDF = employeesDF.select(
    concat(
        lpad("employee_id", 5, "0"), 
        rpad("first_name", 10, "-"), 
        rpad("last_name", 10, "-"),
        lpad("salary", 10, "0"), 
        rpad("nationality", 15, "-"), 
        rpad("phone_number", 17, "-"), 
        "ssn"
    ).alias("employee")
)

Watch the video tutorial here

Hands-On Tasks

  1. Create a Dataframe with a single value in a single column.
  2. Apply lpad to pad “Hello” with “-” to make it 10 characters long.
  3. Create the employees Dataframe and display its schema.
  4. Use lpad and rpad functions to convert fields into fixed length according to the specified requirements.
  5. Create a new Dataframe empFixedDF with fixed-length columns and preview the data without truncation.

Conclusion

In this article, we have explored the concept of padding characters around strings using Spark Functions. By using lpad and rpad functions, we can easily create fixed-length values for various fields in a DataFrame. Practice these tasks to improve your understanding and feel free to engage with the community for further learning.