Data Engineering using Spark SQL - Getting Started - Launching and using Spark SQL CLI

Siva · May 22, 2024, 10:40am

Let us understand how to launch Spark SQL CLI.

To launch Spark SQL CLI, follow these steps:

Logon to the gateway node of the cluster.
Use spark-sql for Spark 1.6.x and spark2-sql for Spark 2.3.x versions.
Launch Spark SQL CLI using spark-sql. Additional arguments may be needed in clustered mode, such as:

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Get help using spark-sql --help.
Connect to a specific database using spark-sql --database training_retail. Example in clustered mode:

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse \
    --database ${USER}_retail

Spark SQL CLI will launch and connect to the ${USER}_retail database.
Validate the connected database by using SELECT current_database().

Now that you have learned how to launch Spark SQL CLI, feel free to practice and engage with the community for further learning.

Watch the video tutorial here