Apache Spark Python - Basic Transformations - Using LIKE Operator or like Function

Let us understand the usage of LIKE operator or like function while filtering the data in Data Frames.

Partial Comparison with LIKE:

The LIKE operator or like function is primarily used for partial comparison. For example, we can search for names that start with a specific pattern such as “Sco”.

# Using like operator to find names starting with 'Sco'
employeesDF.filter("first_name LIKE 'Sco%'").show()

Negation with LIKE:

We can also use negation with the LIKE operator to filter data that does not match a specific pattern.

# Filtering data where phone number does not start with '+44'
employeesDF.filter("phone_number NOT LIKE '+44%'").show()

Case-Insensitive comparison:

The LIKE operator is case-sensitive by default, but we can perform case-insensitive comparisons using functions like upper().

# Using upper function for case-insensitive comparison
employeesDF.filter("upper(first_name) LIKE 'SCO%'").show()

Watch the video tutorial here

Hands-On Tasks

  1. Find employees whose first name starts with ‘Sco’.
  2. Find employees whose first name contains ‘ott’ irrespective of case.
  3. Find employees whose phone number does not start with ‘+44’.

Conclusion

In this article, we learned how to use the LIKE operator or like function in Spark DataFrames for partial comparisons and filtering based on specific patterns. Practice these concepts and experiment with different patterns to enhance your understanding.

Remember, you can sign up or log in to engage with the community for further learning.