Data Engineering Spark SQL - Managing Tables - DDL & DML - Adding Comments

In this article, we will learn how to create a table with comments in Hive using the ‘orders’ dataset as an example. We will cover the key concepts, provide step-by-step instructions, and include hands-on tasks for practice.

Explanation for the video

[Please insert the video link here]

Key Concepts Explanation

Using COMMENT Keyword

We can specify comments for both columns and tables using the COMMENT keyword in Hive. Comments provide additional information about the data structure and are helpful for documentation.

CREATE TABLE orders (
  order_id INT COMMENT 'Unique order id',
  order_date STRING COMMENT 'Date on which order is placed',
  order_customer_id INT COMMENT 'Customer id who placed the order',
  order_status STRING COMMENT 'Current status of the order'
) COMMENT 'Table to save order level details'

Checking Comments

We can view the comments associated with columns and tables using commands like DESCRIBE orders or DESCRIBE FORMATTED orders in Hive.

Hands-On Tasks

  1. Use Spark SQL to set the database context to ‘itversity_retail’.
  2. Drop the existing ‘orders’ table if it already exists.
  3. Create a new ‘orders’ table with columns and comments as shown in the example.

Conclusion

In this article, we have learned how to create a table with comments in Hive using the ‘orders’ dataset as an example. Comments play a crucial role in documentation and understanding the data structure. We encourage you to practice these concepts and explore further in the community for continuous learning.

Adding Comments

You can also use Spark SQL with Python or Scala to create tables with comments. Remember to check the comments using DESCRIBE commands to view the associated information.

Watch the video tutorial here