Let us understand how we can filter the data in Spark SQL. Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.
Explanation for the video
[Please replace this text with the embedded YouTube video link]
Key Concepts Explanation
Description of the key concepts. Provide key concepts as subheadings and explain them with inline code examples. Multi-line code examples should be highlighted using code highlighter.
WHERE Clause
The WHERE
clause is used to filter the data based on specific conditions. It allows us to compare columns, expressions, or literals with other columns, expressions, or literals using operators such as =
, !=
, >
, <
, etc.
SELECT * FROM orders WHERE order_status = 'COMPLETE'
Comparison Operators
All comparison operators like =
, !=
, >
, <
can be used for filtering data based on specific criteria.
SELECT * FROM orders WHERE order_status IN ('COMPLETE', 'CLOSED')
Hands-On Tasks
Description of the hands-on tasks. Provide a list of tasks that the reader can perform to apply the concepts discussed in the article.
- Execute a query to filter orders with order_status as ‘COMPLETE’ or ‘CLOSED’.
- Filter orders based on a specific month and order_status criteria.
Conclusion
In this article, we covered the basics of filtering data in Spark SQL using the WHERE
clause and comparison operators. Practice these concepts and explore more filtering options to enhance your SQL skills.
Filtering Data with Null Values
Let’s prepare a table to showcase how to handle null values while filtering data.
CREATE DATABASE IF NOT EXISTS itversity_sms
CREATE TABLE students (
student_id INT,
student_first_name STRING,
student_last_name STRING,
student_phone_number STRING,
student_address STRING
) STORED AS avro
INSERT INTO students VALUES (1, 'Scott', 'Tiger', NULL, NULL)
Handling Null Values
When dealing with null values, use IS NULL
and IS NOT NULL
operators for comparisons.
SELECT * FROM students WHERE student_phone_number IS NULL
Hands-On Tasks
- Create a table for students with relevant columns.
- Insert records with and without null values.
- Execute queries to filter data based on null values in certain columns.
Conclusion
Handling null values in data filtering is crucial for accurate results. By using IS NULL
and IS NOT NULL
, you can effectively manage null values while querying data in Spark SQL. Practice these concepts to master filtering with null values.