Data Engineering Spark SQL - Managing Tables - DDL & DML - Exercise - Managed Tables

Description:
Learn how to manage tables in Spark Metastore using Spark SQL and Hive. This article provides step-by-step instructions on creating and populating managed tables, determining data types, and validating the data. Watch the accompanying video for a visual guide.

Explanation for the video

Put a placeholder for the video here with text so that I can replace it as part of the automation

Key Concepts Explanation

Creating a Database

To begin, create a database in Spark Metastore with your OS username appended followed by “nyse”.

CREATE DATABASE IF NOT EXISTS your_os_user_name_nyse;

Creating a Managed Table

Create a managed table named nyse_eod with the specified fields and data types using default file format.

CREATE TABLE your_os_user_name_nyse.nyse_eod (
    stockticker STRING,
    tradedate STRING,
    openprice DECIMAL(10,2),
    highprice DECIMAL(10,2),
    lowprice DECIMAL(10,2),
    closeprice DECIMAL(10,2),
    volume BIGINT
);

Hands-On Tasks

Let’s put the concepts into practice:

  1. Create a database with the name “your_os_user_name_nyse”.
  2. Create a table named “nyse_eod” with the given field names and data types.
  3. Copy data from a sample zip file into the table and validate it.

Conclusion

In conclusion, managing tables in Spark Metastore with Spark SQL and Hive is essential for organizing data efficiently. Follow the steps provided and practice the tasks to solidify your understanding. Join our community for further learning and support.

Validation

Run the following queries to validate the table creation and data loading:

DESCRIBE FORMATTED your_os_user_name_nyse.nyse_eod;
SELECT * FROM your_os_user_name_nyse.nyse_eod LIMIT 10;
SELECT COUNT(1) FROM your_os_user_name_nyse.nyse_eod;

Remember to replace your_os_user_name with your actual OS username.

Create a database, define a managed table, and load data to gain hands-on experience in Spark SQL and Hive. Happy learning!

%%sql
SELECT * FROM your_os_user_name_nyse.nyse_eod LIMIT 10

%%sql
SELECT COUNT(1) FROM your_os_user_name_nyse.nyse_eod

Watch the video tutorial here