Let us understand how we can insert data into a partitioned table using dynamic partition mode. Let’s start by opening the Spark context in this Notebook to execute the provided code. You can sign up for our 10 node state-of-the-art cluster/labs to learn Spark SQL using our unique integrated LMS.
Key Concepts Explanation
Using dynamic partition mode
- The dynamic partition mode allows partitions to be automatically created when an INSERT command is executed.
- To insert data using dynamic partition mode, set the property
hive.exec.dynamic.partition
totrue
. - Additionally, set
hive.exec.dynamic.partition.mode
tononstrict
.
Example Code
val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
appName(s"${username} | Spark SQL - Managing Tables - DML and Partitioning").
master("yarn").
getOrCreate
Hands-On Tasks
- Open Spark context using the provided code.
- Set
hive.exec.dynamic.partition
totrue
andhive.exec.dynamic.partition.mode
tononstrict
. - Insert data into a partitioned table using dynamic partition mode.
- Check the newly created partitions.
Conclusion
In this article, we discussed how to insert data into partitioned tables using dynamic partition mode in Spark SQL. By following the provided steps, you can practice and explore this concept further.
Click here to watch the accompanying video for a better understanding
Remember to engage with the community for any questions or further learning opportunities.