In this article, we will discuss how to handle null values while filtering data using Spark. We will cover various methods and commands to filter out records based on null or empty values in a specific column.
Filtering Non-Null Records
To filter all records where the ‘bonus’ column is not null:
employeesDF.filter("bonus IS NOT NULL").show()
To filter all records where the ‘bonus’ column is not null or empty:
employeesDF.filter("bonus <> ''").show()
Filtering Null Records
To filter all records where the ‘bonus’ column is null:
employeesDF.filter("bonus IS NULL").show()
To filter all records where the ‘bonus’ column is empty:
employeesDF.filter("bonus = ''").show()
Hands-On Tasks
- Get all the records where bonus is not null or not empty.
- Get all the records where bonus is null or empty.
Conclusion
In this article, we explored how to handle null values while filtering data using Spark. By applying the provided examples and tasks, readers can better understand and practice filtering out records based on null or empty values in a column. Remember to experiment with the code examples provided and engage with the community for further learning opportunities.