Data Engineering Spark SQL - Managing Tables - DDL & DML - Create Spark Metastore Tables

Let us understand how to create tables in Spark Metastore. We will be focusing on syntax and semantics.
Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.

Key Concepts Explanation

Managed Table
A managed table is a table where Spark manages the lifecycle of the data.

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Text File Format
Text file format refers to the file format where data is stored in a text format.

spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Field Delimiter ‘,’
A field delimiter is a character or text separating one field from another in a record.

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Hands-On Tasks

  1. Determine the table type, file format, and field delimiter.
  2. Create a table based on the structure of data in /data/retail_db/orders.

Conclusion

In this article, we covered the basics of creating tables in Spark Metastore. Practice these concepts and engage with the community for further learning.

Watch the video tutorial here