Data Engineering using Spark SQL - Getting Started - Managing Spark Metastore Tables

Spark Metastore Table Creation

In this section, we will explore how to create a Spark Metastore table using Apache Spark SQL. We will walk through the process of creating a basic table with columns and data types.

val username = System.getProperty("user.name")

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    master("yarn").
    appName(s"${username} | Spark SQL - Getting Started").
    getOrCreate

SQL Commands for Metastore Tables

We will use SQL commands to manage Spark Metastore tables. Commands like CREATE DATABASE, USE, CREATE TABLE, and SHOW tables will be covered.

CREATE DATABASE itversity_retail

USE itversity_retail

CREATE TABLE orders (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

Hands-On Tasks

Let’s dive into some hands-on tasks to solidify our understanding of Spark Metastore tables.

  1. Connect to Spark session and create a new Metastore database.
  2. Create a new table in the Metastore database with specific columns and data types.

Conclusion

In this article, we have learned the basics of managing Spark Metastore tables. By following the provided hands-on tasks, you can practice creating databases and tables in your own Spark environment. Remember to engage with the community for further learning opportunities.

Watch the video tutorial here