Description:
Learn how to manage tables in Spark Metastore using Spark SQL and Hive. This article provides step-by-step instructions on creating and populating managed tables, determining data types, and validating the data. Watch the accompanying video for a visual guide.
Explanation for the video
Put a placeholder for the video here with text so that I can replace it as part of the automation
Key Concepts Explanation
Creating a Database
To begin, create a database in Spark Metastore with your OS username appended followed by “nyse”.
CREATE DATABASE IF NOT EXISTS your_os_user_name_nyse;
Creating a Managed Table
Create a managed table named nyse_eod
with the specified fields and data types using default file format.
CREATE TABLE your_os_user_name_nyse.nyse_eod (
stockticker STRING,
tradedate STRING,
openprice DECIMAL(10,2),
highprice DECIMAL(10,2),
lowprice DECIMAL(10,2),
closeprice DECIMAL(10,2),
volume BIGINT
);
Hands-On Tasks
Let’s put the concepts into practice:
- Create a database with the name “your_os_user_name_nyse”.
- Create a table named “nyse_eod” with the given field names and data types.
- Copy data from a sample zip file into the table and validate it.
Conclusion
In conclusion, managing tables in Spark Metastore with Spark SQL and Hive is essential for organizing data efficiently. Follow the steps provided and practice the tasks to solidify your understanding. Join our community for further learning and support.
Validation
Run the following queries to validate the table creation and data loading:
DESCRIBE FORMATTED your_os_user_name_nyse.nyse_eod;
SELECT * FROM your_os_user_name_nyse.nyse_eod LIMIT 10;
SELECT COUNT(1) FROM your_os_user_name_nyse.nyse_eod;
Remember to replace your_os_user_name
with your actual OS username.
Create a database, define a managed table, and load data to gain hands-on experience in Spark SQL and Hive. Happy learning!
%%sql
SELECT * FROM your_os_user_name_nyse.nyse_eod LIMIT 10
%%sql
SELECT COUNT(1) FROM your_os_user_name_nyse.nyse_eod