Let us understand how to perform outer joins using Spark SQL. There are 3 different types of outer joins. Let’s start a spark context for this Notebook so that we can execute the code provided.
Key Concepts Explanation
LEFT OUTER JOIN (default): Get all the records from both the datasets which satisfy the JOIN condition along with those records which are in the left side table but not in the right side table.
Example:
SELECT o.order_id,
o.order_date,
o.order_status,
oi.order_item_order_id,
oi.order_item_subtotal
FROM orders o LEFT OUTER JOIN order_items oi
ON o.order_id = oi.order_item_order_id
LIMIT 10
RIGHT OUTER JOIN: Get all the records from both the datasets which satisfy the JOIN condition along with those records which are in the right side table but not in the left side table.
Example:
SELECT o.order_id,
o.order_date,
o.order_status,
oi.order_item_order_id,
oi.order_item_subtotal
FROM orders o RIGHT OUTER JOIN order_items oi
ON o.order_id = oi.order_item_order_id
LIMIT 10
Hands-On Tasks
- Perform LEFT OUTER JOIN between ‘orders’ and ‘order_items’ tables.
- Perform RIGHT OUTER JOIN between ‘orders’ and ‘order_items’ tables.
Conclusion
In this article, we covered the concept of outer joins in Spark SQL using LEFT and RIGHT joins. We provided examples to clarify the usage of these joins. Practice these tasks to reinforce your understanding of outer joins in Spark SQL.