Data Engineering using Spark SQL - Getting Started - Launching and using Spark SQL CLI

Let us understand how to launch Spark SQL CLI.

To launch Spark SQL CLI, follow these steps:

  1. Logon to the gateway node of the cluster.
  2. Use spark-sql for Spark 1.6.x and spark2-sql for Spark 2.3.x versions.
  3. Launch Spark SQL CLI using spark-sql. Additional arguments may be needed in clustered mode, such as:
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  1. Get help using spark-sql --help.
  2. Connect to a specific database using spark-sql --database training_retail. Example in clustered mode:
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse \
    --database ${USER}_retail
  1. Spark SQL CLI will launch and connect to the ${USER}_retail database.
  2. Validate the connected database by using SELECT current_database().

Now that you have learned how to launch Spark SQL CLI, feel free to practice and engage with the community for further learning.

Watch the video tutorial here