Let us understand how to create an external table in Spark Metastore using orders as an example. We will also see how to load data into the external table. Ensure to sign up for our 10 node state-of-the-art cluster/labs to learn Spark SQL using our unique integrated LMS.
Key Concepts Explanation
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
- Add EXTERNAL keyword in the CREATE clause and LOCATION after STORED AS clause or just LOCATION as part of CREATE TABLE statement.
- You can use the same LOAD command to get data from either the local file system or HDFS, which we have used for a Managed table.
- Run
DESCRIBE FORMATTED orders
to check the metadata of the table and confirm whether it is a managed table or an external table. - Specify the location while creating external tables.
Hands-On Tasks
- Start spark context for this Notebook to execute the provided code.
- Create an external table named “orders” with the specified column details.
- Load data into the “orders” table using the provided data.
Conclusion
In this article, we learned how to create an external table in Spark Metastore using the example of “orders.” By following the step-by-step instructions provided, you can practice creating and loading data into external tables effectively. Join our community for further learning and support.
[Embed the video here]