Data Frames can be written into Metastore Tables using APIs such as saveAsTable
and insertInto
available as part of write on top of objects of type Data Frame.
Create Database
To create a database named demo_db
in the metastore, we can use the following SQL query:
spark.sql(f"CREATE DATABASE demo_db")
List Databases
To list all the databases in the metastore, we can use the following API:
spark.catalog.listDatabases()
Create Data Frame and Table
To create a Data Frame with one column named dummy
and value X
, and then create a table named dual
using this Data Frame, we can execute the following steps:
l = [("X", )]
df = spark.createDataFrame(l, schema="dummy STRING")
df.write.saveAsTable("dual", mode='overwrite')
Insert Data into Table
To insert data into an existing empty table, we can use the following commands:
schema = df.schema
spark.catalog.createTable('dual', schema=schema)
df.write.insertInto('dual')
Drop Table and Database
To drop the table dual
and the database demo_db
, we can execute the following SQL queries:
spark.sql("DROP TABLE dual")
spark.sql("DROP DATABASE demo_db")
Hands-On Tasks
- Create a database named
demo_db
in the metastore. - List all databases in the metastore.
- Create a Data Frame with one column
dummy
and valueX
, and create a table nameddual
with this Data Frame. - Insert the Data Frame into the
dual
table. - Drop the
dual
table and thedemo_db
database.
Conclusion
In this article, we learned how to create, insert data into, and drop Metastore tables using the catalog in Spark. Practice these tasks to enhance your understanding and feel free to engage with the community for further learning.