Data Engineering using Spark SQL - Getting Started - Retrieve Metadata of Tables

As the table is created, let us understand how to get the metadata of a table.

  • We can get metadata of Hive Tables using several commands.

    • DESCRIBE - e.g.: DESCRIBE orders;

    • DESCRIBE EXTENDED - e.g.: DESCRIBE EXTENDED orders;

    • DESCRIBE FORMATTED - e.g.: DESCRIBE FORMATTED orders;

  • DESCRIBE will give only field names and data types.

  • DESCRIBE EXTENDED will give all the metadata, but not in a readable format in Hive. It is the same as DESCRIBE FORMATTED in Spark SQL.

  • DESCRIBE FORMATTED will give metadata in a readable format.

As the output is truncated using Jupyter, we will actually see the details using spark-sql

import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    master("yarn").
    appName(s"${username} | Spark SQL - Getting Started").
    getOrCreate
SELECT current_database()
USE itversity_retail
SHOW tables
DESCRIBE orders
DESCRIBE EXTENDED orders
DESCRIBE FORMATTED orders

Watch the video tutorial here