# Data Engineering using Spark SQL - Basic Transformations - Define Problem Statement

In this article, we will delve into basic transformations using Spark SQL and walk you through a practical example of calculating daily product revenue. The tutorial is based on a video tutorial available on YouTube which complements the text provided.

[Insert Video Here]

## Key Concepts Explanation

### Data Preparation

Firstly, we need to prepare the data for our analysis. We will be using the `orders` and `order_items` datasets to calculate the daily product revenue. The key fields in the `orders` dataset include `order_id`, `order_date`, `order_customer_id`, and `order_status`. On the other hand, the `order_items` dataset consists of fields such as `order_item_id`, `order_item_order_id`, `order_item_product_id`, `order_item_quantity`, `order_item_subtotal`, and `order_item_product_price`.

We establish a one-to-many relationship between `orders` and `order_items`, where `orders.order_id` serves as the primary key and `order_items.order_item_order_id` acts as the foreign key to `orders.order_id`.

### Daily Product Revenue Calculation

Our main objective is to calculate the daily product revenue using the `orders.order_date`, `order_items.order_item_product_id`, and `order_items.order_item_subtotal` fields. We will aggregate the subtotal based on the date and product ID to derive the daily revenue.

To reinforce your understanding, here are some hands-on tasks you can perform:

1. Load the `orders` and `order_items` datasets into Spark SQL.
2. Filter out only the `COMPLETE` or `CLOSED` orders from the `orders` dataset.
3. Join the `orders` and `order_items` datasets on the respective keys.
4. Aggregate the subtotal based on the date and product ID to obtain the daily product revenue.

## Conclusion

In conclusion, mastering basic transformations using Spark SQL is essential for data processing and analysis. By applying the concepts discussed in this article, you can efficiently calculate metrics such as daily product revenue. I encourage you to practice these techniques and engage with the community for further learning and upskilling.

## Define Problem Statement

Let us define the problem statement to gain an overview of basic transformations using Spark SQL. Our goal is to calculate the daily product revenue by leveraging the `orders` and `order_items` datasets. We will explore various standard transformations and extract the necessary fields to achieve our objective. Letâ€™s get started!

Watch the video tutorial here