Let us understand how to DROP Spark Metastore Tables as well as Databases. Let us start the spark context for this Notebook so that we can execute the code provided.
Key Concepts Explanation
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
- We can use DROP TABLE command to drop the table. Let us drop the “orders” table.
Dropping Tables
%%sql
DROP TABLE orders
- DROP TABLE on a managed table will delete both metadata in the metastore as well as data in HDFS, while DROP TABLE on an external table will only delete metadata in the metastore.
Dropping Databases
-
We can drop a database by using the DROP DATABASE command. However, we need to drop all tables in the database first.
-
Here is an example to drop the database “itversity_retail”:
DROP DATABASE itversity_retail
-
We can also drop all tables and databases by adding the CASCADE option.
%%sql
DROP DATABASE itversity_retail
Hands-On Tasks
Description of the hands-on tasks. Provide a list of tasks that the reader can perform to apply the concepts discussed in the article.
- Run the provided code snippets to understand how to drop tables and databases in Spark SQL.
Conclusion
In this article, we learned how to efficiently drop Spark Metastore Tables and Databases. It is essential to understand the implications of dropping tables and databases, especially regarding managed and external tables. Practice these concepts in your own environment to solidify your understanding.
Let me know if you have any questions or need further clarification.