Let us see how we can perform word count using Spark SQL. Using word count as an example we will understand how we can come up with the solution using pre-defined functions available.
Explanation: The video provided at the link in YouTube complements this text by visually demonstrating the steps involved in performing word count using Spark SQL.
Key Concepts Explanation
Spark Context Setup
To begin with, set up the Spark context in the Notebook using the code snippet provided to execute the subsequent code.
val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
appName(s"${username} | Spark SQL - Predefined Functions").
master("yarn").
getOrCreate
Using Spark SQL, Scala, and Pyspark
Execute Spark SQL, Scala, or Pyspark using the respective approaches mentioned below:
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Hands-On Tasks
Perform the following hands-on tasks to practice word count using Spark SQL:
- Create a table named lines.
- Insert data into the table.
- Split lines into an array of words.
- Explode the array of words from each line into individual records.
- Use group by to get the count of each word.
Conclusion
In conclusion, you have learned how to perform word count using Spark SQL. Encourage you to practice these concepts and engage with the community for further learning.