Apache Spark Python - Basic Transformations - Dealing with Nulls while Filtering

In this article, we will discuss how to handle null values while filtering data using Spark. We will cover various methods and commands to filter out records based on null or empty values in a specific column.

Filtering Non-Null Records

To filter all records where the ‘bonus’ column is not null:

employeesDF.filter("bonus IS NOT NULL").show()

To filter all records where the ‘bonus’ column is not null or empty:

employeesDF.filter("bonus <> ''").show()

Filtering Null Records

To filter all records where the ‘bonus’ column is null:

employeesDF.filter("bonus IS NULL").show()

To filter all records where the ‘bonus’ column is empty:

employeesDF.filter("bonus = ''").show()

Watch the video tutorial here

Hands-On Tasks

  1. Get all the records where bonus is not null or not empty.
  2. Get all the records where bonus is null or empty.

Conclusion

In this article, we explored how to handle null values while filtering data using Spark. By applying the provided examples and tasks, readers can better understand and practice filtering out records based on null or empty values in a column. Remember to experiment with the code examples provided and engage with the community for further learning opportunities.