Apache Spark Python - Processing Column Data - Special Functions - col and lit

This article provides a detailed explanation of key concepts such as col and lit in Spark DataFrames. It covers the usage of these functions to convert strings to column type and offers practical examples to enhance understanding.

Let us understand special functions such as col and lit. These functions are typically used to convert the strings to column type.

  • First let us create Data Frame for demo purposes.
    Let us start spark context for this Notebook so that we can execute the code provided.

Import the necessary libraries and create the DataFrame using the provided code snippets.

Using col for Column Reference

The col function is used to convert column names from string type to Column type. It allows referencing column names as Column type within DataFrames.


from pyspark.sql.functions import col

# Using the col function to select specific columns
employeesDF.select(col("first_name"), col("last_name")).show()

Adding Literals with lit

The lit function is used to add literals to column values in DataFrames. It is necessary when passing direct string or numeric values as parameters.


from pyspark.sql.functions import concat, col, lit

# Using the lit function to concatenate first_name and last_name
employeesDF.select(concat(col("first_name"), lit(", "), col("last_name")).alias("full_name")).show(truncate=False)

For more detailed explanations and practical examples, refer to the video tutorial.

Watch the video tutorial here

Hands-On Tasks

Tasks to practice and apply the concepts discussed in the article:

  1. Use the col function to select columns and apply transformations.
  2. Apply the lit function to add literals in column values.


In conclusion, understanding the functions col and lit in Spark DataFrames is crucial for manipulating data efficiently. By practicing these concepts, readers can enhance their skills in working with DataFrames. Join the community to interact with fellow learners and continue your Spark SQL journey.