Let us understand the usage of LIKE
operator or like
function while filtering the data in Data Frames.
Partial Comparison with LIKE:
The LIKE
operator or like
function is primarily used for partial comparison. For example, we can search for names that start with a specific pattern such as “Sco”.
# Using like operator to find names starting with 'Sco'
employeesDF.filter("first_name LIKE 'Sco%'").show()
Negation with LIKE:
We can also use negation with the LIKE
operator to filter data that does not match a specific pattern.
# Filtering data where phone number does not start with '+44'
employeesDF.filter("phone_number NOT LIKE '+44%'").show()
Case-Insensitive comparison:
The LIKE
operator is case-sensitive by default, but we can perform case-insensitive comparisons using functions like upper()
.
# Using upper function for case-insensitive comparison
employeesDF.filter("upper(first_name) LIKE 'SCO%'").show()
Hands-On Tasks
- Find employees whose first name starts with ‘Sco’.
- Find employees whose first name contains ‘ott’ irrespective of case.
- Find employees whose phone number does not start with ‘+44’.
Conclusion
In this article, we learned how to use the LIKE
operator or like
function in Spark DataFrames for partial comparisons and filtering based on specific patterns. Practice these concepts and experiment with different patterns to enhance your understanding.
Remember, you can sign up or log in to engage with the community for further learning.