Data Engineering Spark SQL - Tables - DML & Partitioning - Using Dynamic Partition Mode

Let us understand how we can insert data into a partitioned table using dynamic partition mode. Let’s start by opening the Spark context in this Notebook to execute the provided code. You can sign up for our 10 node state-of-the-art cluster/labs to learn Spark SQL using our unique integrated LMS.

Key Concepts Explanation

Using dynamic partition mode

  • The dynamic partition mode allows partitions to be automatically created when an INSERT command is executed.
  • To insert data using dynamic partition mode, set the property hive.exec.dynamic.partition to true.
  • Additionally, set hive.exec.dynamic.partition.mode to nonstrict.

Example Code

val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession

val spark = SparkSession.
  builder.
  config("spark.ui.port", "0").
  config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
  enableHiveSupport.
  appName(s"${username} | Spark SQL - Managing Tables - DML and Partitioning").
  master("yarn").
  getOrCreate

Hands-On Tasks

  1. Open Spark context using the provided code.
  2. Set hive.exec.dynamic.partition to true and hive.exec.dynamic.partition.mode to nonstrict.
  3. Insert data into a partitioned table using dynamic partition mode.
  4. Check the newly created partitions.

Conclusion

In this article, we discussed how to insert data into partitioned tables using dynamic partition mode in Spark SQL. By following the provided steps, you can practice and explore this concept further.

Click here to watch the accompanying video for a better understanding

Remember to engage with the community for any questions or further learning opportunities.

Watch the video tutorial here