Data Engineering Spark SQL - Managing Tables - DDL & DML - Drop Tables and Databases

Let us understand how to DROP Spark Metastore Tables as well as Databases. Let us start the spark context for this Notebook so that we can execute the code provided.

Key Concepts Explanation

Using Spark SQL


spark2-sql \

    --master yarn \

    --conf spark.ui.port=0 \

    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Scala


spark2-shell \

    --master yarn \

    --conf spark.ui.port=0 \

    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Pyspark


pyspark2 \

    --master yarn \

    --conf spark.ui.port=0 \

    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

  • We can use DROP TABLE command to drop the table. Let us drop the “orders” table.

Dropping Tables

%%sql

DROP TABLE orders
  • DROP TABLE on a managed table will delete both metadata in the metastore as well as data in HDFS, while DROP TABLE on an external table will only delete metadata in the metastore.

Dropping Databases

  • We can drop a database by using the DROP DATABASE command. However, we need to drop all tables in the database first.

  • Here is an example to drop the database “itversity_retail”: DROP DATABASE itversity_retail

  • We can also drop all tables and databases by adding the CASCADE option.

%%sql

DROP DATABASE itversity_retail

Hands-On Tasks

Description of the hands-on tasks. Provide a list of tasks that the reader can perform to apply the concepts discussed in the article.

  1. Run the provided code snippets to understand how to drop tables and databases in Spark SQL.

Conclusion

In this article, we learned how to efficiently drop Spark Metastore Tables and Databases. It is essential to understand the implications of dropping tables and databases, especially regarding managed and external tables. Practice these concepts in your own environment to solidify your understanding.

Let me know if you have any questions or need further clarification.

Watch the video tutorial here