Let us go through the details related to total aggregations using Spark.
-
We can perform total aggregations directly on Dataframe or we can perform aggregations after grouping by a key(s).
-
Here are the functions which we typically use to perform aggregations.
-
count
-
sum
,avg
-
min
,max
-
In this section, we will break down the key concepts related to total aggregations using Spark.
Aggregation Functions
Aggregation functions are used to perform calculations on groups of rows of a DataFrame. Here are the commonly used aggregation functions:
# Counting total number of rows
airtraffic.count()
Distinct Values
Calculating the number of distinct values in a DataFrame is essential. Here’s how you can do it:
# Counting distinct values
airtraffic. \
select('Year', 'Month', 'DayOfMonth'). \
distinct(). \
count()
Total Bonus Amount
Calculating the total bonus amount from a dataset can be done using the sum
function:
# Calculating total bonus amount
employeesDF. \
select(((sum(coalesce(col('bonus').cast('int'), lit(0)) * col('salary'))) / lit(100)).alias('total_bonus')). \
show()
Revenue Calculation
Determining the revenue generated for a given order from a dataset can be achieved using the sum
function:
# Calculating order revenue
order_items. \
filter(col('order_item_order_id') == lit(int(order_id))). \
select(sum('order_item_subtotal').alias('order_revenue')). \
show()
Hands-On Tasks
Here are some hands-on tasks for you to apply the concepts discussed above:
- Calculate the total number of rows in the
airtraffic
DataFrame. - Find the distinct count of dates from the
airtraffic
DataFrame. - Calculate the total bonus amount from the
employeesDF
DataFrame. - Determine the revenue generated for a specific order from the
order_items
dataset.
Conclusion
In this article, we discussed the key concepts related to total aggregations using Spark. We covered aggregation functions, distinct values, calculating bonus amounts, and revenue calculations. It is essential to practice these concepts hands-on to gain a better understanding. Feel free to engage with the community for further learning.